Python pyspark Series.count用法及代码示例

本文简要介绍 pyspark.pandas.Series.count 的用法。

用法: Series.count(axis: Union[int, str, None] = None, numeric_only: bool = False) → Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, pyspark.pandas.series.Series]

计算每列的非 NA 单元格。

值 None 、 NaN 被视为 NA。

参数：

axis：{0 或 ‘index’，1 或 ‘columns’}，默认 0: 如果为每列生成 0 或 ‘index’ 计数。如果为每一行生成 1 或 ‘columns’ 计数。
numeric_only：布尔值，默认为 False: 如果为 True，则仅包括 float、int、boolean 列。这个参数主要是为了pandas的兼容性。

max：Series 的标量和 DataFrame 的 Series。

例子：

从字典构造DataFrame：

>>> df = ps.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]},
...                   columns=["Person", "Age", "Single"])
>>> df
  Person   Age  Single
0   John  24.0   False
1   Myla   NaN    True
2  Lewis  21.0    True
3   John  33.0    True
4   Myla  26.0   False

注意未计数的 NA 值：

>>> df.count()
Person    5
Age       4
Single    5
dtype: int64

>>> df.count(axis=1)
0    3
1    2
2    3
3    3
4    3
dtype: int64

在一个系列上：

>>> df['Person'].count()
5

>>> df['Age'].count()
4

相关用法

注：本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.pandas.Series.count。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数：

返回：

例子：