Python pyspark GroupBy.rank用法及代碼示例

本文簡要介紹 pyspark.pandas.groupby.GroupBy.rank 的用法。

用法: GroupBy.rank(method: str = 'average', ascending: bool = True) → FrameLike

提供每個組中值的排名。

參數：

method：{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}，默認 ‘average’

平均：組的平均排名
min：組中的最低排名
max：組中的最高排名
第一：按照它們在數組中出現的順序分配的等級
密集：類似於‘min’，但組間排名總是增加 1

ascending：布爾值，默認 True

從高 (1) 到低 (N) 的等級為假

DataFrame 每組內的值排名

例子：

>>> df = ps.DataFrame({
...     'a': [1, 1, 1, 2, 2, 2, 3, 3, 3],
...     'b': [1, 2, 2, 2, 3, 3, 3, 4, 4]}, columns=['a', 'b'])
>>> df
   a  b
0  1  1
1  1  2
2  1  2
3  2  2
4  2  3
5  2  3
6  3  3
7  3  4
8  3  4

>>> df.groupby("a").rank().sort_index()
     b
0  1.0
1  2.5
2  2.5
3  1.0
4  2.5
5  2.5
6  1.0
7  2.5
8  2.5

>>> df.b.groupby(df.a).rank(method='max').sort_index()
0    1.0
1    3.0
2    3.0
3    1.0
4    3.0
5    3.0
6    1.0
7    3.0
8    3.0
Name: b, dtype: float64

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.groupby.GroupBy.rank。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

例子：