Python pyspark DataFrame.pipe用法及代碼示例

本文簡要介紹 pyspark.pandas.DataFrame.pipe 的用法。

用法:
DataFrame.pipe(func: Callable[[…], Any], *args: Any, **kwargs: Any) → Any

應用 func(self, *args, **kwargs)。

參數：

func：函數: 應用於DataFrame的函數。 args 和 kwargs 被傳遞到 func 。或者 (callable, data_keyword) 元組，其中 data_keyword 是一個字符串，指示需要 DataFrames 的 callable 的關鍵字。
args：可迭代的，可選的: 傳遞給 func 的位置參數。
kwargs：映射，可選: 傳遞給 func 的關鍵字參數字典。

object：func 的返回類型。

注意：

將需要 Series、DataFrames 或 GroupBy 對象的函數鏈接在一起時，請使用 .pipe。例如，給定

>>> df = ps.DataFrame({'category': ['A', 'A', 'B'],
...                    'col1': [1, 2, 3],
...                    'col2': [4, 5, 6]},
...                   columns=['category', 'col1', 'col2'])
>>> def keep_category_a(df):
...     return df[df['category'] == 'A']
>>> def add_one(df, column):
...     return df.assign(col3=df[column] + 1)
>>> def multiply(df, column1, column2):
...     return df.assign(col4=df[column1] * df[column2])

而不是寫

>>> multiply(add_one(keep_category_a(df), column="col1"), column1="col2", column2="col3")
  category  col1  col2  col3  col4
0        A     1     4     2     8
1        A     2     5     3    15

你可以寫

>>> (df.pipe(keep_category_a)
...    .pipe(add_one, column="col1")
...    .pipe(multiply, column1="col2", column2="col3")
... )
  category  col1  col2  col3  col4
0        A     1     4     2     8
1        A     2     5     3    15

如果您有一個將數據作為(例如)第二個參數的函數，請傳遞一個元組，指示哪個關鍵字需要數據。例如，假設 f 將其數據作為 df ：

>>> def multiply_2(column1, df, column2):
...     return df.assign(col4=df[column1] * df[column2])

然後你可以寫

>>> (df.pipe(keep_category_a)
...    .pipe(add_one, column="col1")
...    .pipe((multiply_2, 'df'), column1="col2", column2="col3")
... )
  category  col1  col2  col3  col4
0        A     1     4     2     8
1        A     2     5     3    15

您也可以使用 lambda

>>> ps.Series([1, 2, 3]).pipe(lambda x: (x + 1).rename("value"))
0    2
1    3
2    4
Name: value, dtype: int64

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.DataFrame.pipe。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

注意：