Python pyspark GroupBy.cumcount用法及代码示例

本文简要介绍 pyspark.pandas.groupby.GroupBy.cumcount 的用法。

用法: GroupBy.cumcount(ascending: bool = True) → pyspark.pandas.series.Series

从 0 到该组的长度为每组中的每个项目编号 - 1。

本质上这相当于

self.apply(lambda x: pd.Series(np.arange(len(x)), x.index))

参数：

ascending：布尔值，默认为真: 如果为 False，则反向编号，从组的长度 - 1 到 0。

Series: 每个组中每个元素的序列号。

例子：

>>> df = ps.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']],
...                   columns=['A'])
>>> df
   A
0  a
1  a
2  a
3  b
4  b
5  a
>>> df.groupby('A').cumcount().sort_index()
0    0
1    1
2    2
3    0
4    1
5    3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False).sort_index()
0    3
1    2
2    1
3    1
4    0
5    0
dtype: int64

相关用法

注：本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.pandas.groupby.GroupBy.cumcount。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数：

返回：

例子：