Python pyspark DataFrame.truncate用法及代码示例

本文简要介绍 pyspark.pandas.DataFrame.truncate 的用法。

用法: DataFrame.truncate(before: Optional[Any] = None, after: Optional[Any] = None, axis: Union[int, str, None] = None, copy: bool = True) → Union[DataFrame, Series]

在某个索引值之前和之后截断系列或DataFrame。

这是基于高于或低于某些阈值的索引值的布尔索引的有用简写。

注意

此 API 依赖于 Index.is_monotonic_increasing() ，这可能很昂贵。

参数：

before：日期、字符串、int: 截断此索引值之前的所有行。
after：日期、字符串、int: 截断此索引值之后的所有行。
axis：{0 或 ‘index’，1 或 ‘columns’}，可选: 要截断的轴。默认情况下截断索引(行)。
copy：布尔值，默认为 True，: 返回截断部分的副本。

调用者类型: 截断的 Series 或 DataFrame。

例子：

>>> df = ps.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
...                    'B': ['f', 'g', 'h', 'i', 'j'],
...                    'C': ['k', 'l', 'm', 'n', 'o']},
...                   index=[1, 2, 3, 4, 5])
>>> df
   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o

>>> df.truncate(before=2, after=4)
   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n

DataFrame 的列可以被截断。

>>> df.truncate(before="A", after="B", axis="columns")
   A  B
1  a  f
2  b  g
3  c  h
4  d  i
5  e  j

对于 Series，只能截断行。

>>> df['A'].truncate(before=2, after=4)
2    b
3    c
4    d
Name: A, dtype: object

Series 具有对整数进行排序的索引。

>>> s = ps.Series([10, 20, 30, 40, 50, 60, 70],
...               index=[1, 2, 3, 4, 5, 6, 7])
>>> s
1    10
2    20
3    30
4    40
5    50
6    60
7    70
dtype: int64

>>> s.truncate(2, 5)
2    20
3    30
4    40
5    50
dtype: int64

Series 具有对字符串进行排序的索引。

>>> s = ps.Series([10, 20, 30, 40, 50, 60, 70],
...               index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
>>> s
a    10
b    20
c    30
d    40
e    50
f    60
g    70
dtype: int64

>>> s.truncate('b', 'e')
b    20
c    30
d    40
e    50
dtype: int64

相关用法

注：本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.pandas.DataFrame.truncate。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数：

返回：

例子：