Python pyspark Series.count用法及代碼示例

本文簡要介紹 pyspark.pandas.Series.count 的用法。

用法: Series.count(axis: Union[int, str, None] = None, numeric_only: bool = False) → Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, pyspark.pandas.series.Series]

計算每列的非 NA 單元格。

值 None 、 NaN 被視為 NA。

參數：

axis：{0 或 ‘index’，1 或 ‘columns’}，默認 0: 如果為每列生成 0 或 ‘index’ 計數。如果為每一行生成 1 或 ‘columns’ 計數。
numeric_only：布爾值，默認為 False: 如果為 True，則僅包括 float、int、boolean 列。這個參數主要是為了pandas的兼容性。

max：Series 的標量和 DataFrame 的 Series。

例子：

從字典構造DataFrame：

>>> df = ps.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]},
...                   columns=["Person", "Age", "Single"])
>>> df
  Person   Age  Single
0   John  24.0   False
1   Myla   NaN    True
2  Lewis  21.0    True
3   John  33.0    True
4   Myla  26.0   False

注意未計數的 NA 值：

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64

>>> df.count(axis=1)
0    3
1    2
2    3
3    3
4    3
dtype: int64

在一個係列上：

>>> df['Person'].count()
5

>>> df['Age'].count()
4

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.Series.count。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

例子：