Python pyspark GroupBy.cumcount用法及代碼示例

本文簡要介紹 pyspark.pandas.groupby.GroupBy.cumcount 的用法。

用法: GroupBy.cumcount(ascending: bool = True) → pyspark.pandas.series.Series

從 0 到該組的長度為每組中的每個項目編號 - 1。

本質上這相當於

self.apply(lambda x: pd.Series(np.arange(len(x)), x.index))

參數：

ascending：布爾值，默認為真: 如果為 False，則反向編號，從組的長度 - 1 到 0。

Series: 每個組中每個元素的序列號。

例子：

>>> df = ps.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']],
...                   columns=['A'])
>>> df
   A
0  a
1  a
2  a
3  b
4  b
5  a
>>> df.groupby('A').cumcount().sort_index()
0    0
1    1
2    2
3    0
4    1
5    3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False).sort_index()
0    3
1    2
2    1
3    1
4    0
5    0
dtype: int64

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.groupby.GroupBy.cumcount。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

例子：