Python pyspark Series.pipe用法及代码示例

本文简要介绍 pyspark.pandas.Series.pipe 的用法。

用法:
Series.pipe(func: Callable[[…], Any], *args: Any, **kwargs: Any) → Any

应用 func(self, *args, **kwargs)。

参数：

func：函数: 应用于DataFrame的函数。 args 和 kwargs 被传递到 func 。或者 (callable, data_keyword) 元组，其中 data_keyword 是一个字符串，指示需要 DataFrames 的 callable 的关键字。
args：可迭代的，可选的: 传递给 func 的位置参数。
kwargs：映射，可选: 传递给 func 的关键字参数字典。

object：func 的返回类型。

注意：

将需要 Series、DataFrames 或 GroupBy 对象的函数链接在一起时，请使用 .pipe。例如，给定

>>> df = ps.DataFrame({'category': ['A', 'A', 'B'],
...                    'col1': [1, 2, 3],
...                    'col2': [4, 5, 6]},
...                   columns=['category', 'col1', 'col2'])
>>> def keep_category_a(df):
...     return df[df['category'] == 'A']
>>> def add_one(df, column):
...     return df.assign(col3=df[column] + 1)
>>> def multiply(df, column1, column2):
...     return df.assign(col4=df[column1] * df[column2])

而不是写

>>> multiply(add_one(keep_category_a(df), column="col1"), column1="col2", column2="col3")
  category  col1  col2  col3  col4
0        A     1     4     2     8
1        A     2     5     3    15

你可以写

>>> (df.pipe(keep_category_a)
...    .pipe(add_one, column="col1")
...    .pipe(multiply, column1="col2", column2="col3")
... )
  category  col1  col2  col3  col4
0        A     1     4     2     8
1        A     2     5     3    15

如果您有一个将数据作为(例如)第二个参数的函数，请传递一个元组，指示哪个关键字需要数据。例如，假设 f 将其数据作为 df ：

>>> def multiply_2(column1, df, column2):
...     return df.assign(col4=df[column1] * df[column2])

然后你可以写

>>> (df.pipe(keep_category_a)
...    .pipe(add_one, column="col1")
...    .pipe((multiply_2, 'df'), column1="col2", column2="col3")
... )
  category  col1  col2  col3  col4
0        A     1     4     2     8
1        A     2     5     3    15

您也可以使用 lambda

>>> ps.Series([1, 2, 3]).pipe(lambda x: (x + 1).rename("value"))
0    2
1    3
2    4
Name: value, dtype: int64

相关用法

注：本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.pandas.Series.pipe。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数：

返回：

注意：