Python PySpark DataFrame collect方法用法及代码示例

PySpark DataFrame 的collect() 方法将DataFrame 的所有记录作为Row 对象的列表返回。

返回值

Row 对象的列表。

例子

考虑以下PySpark DataFrame：

df = spark.createDataFrame([["Alex", 25], ["Bob", 30]], ["name", "age"])
df.show()



+----+---+
|name|age|
+----+---+
|Alex| 25|
| Bob| 30|
+----+---+

获取 PySpark DataFrame 的所有行作为 Row 对象的列表

要将所有行获取为 Row 对象列表：

df.collect()



[Row(name='Alex', age=25), Row(name='Bob', age=30)]

警告

在底层，collect(~) 方法将分散在工作节点上的所有数据发送到主派生节点。这意味着如果数据量很大，那么驱动程序将耗尽内存并抛出错误。

相关用法

Python Pandas DataFrame columns属性用法及代码示例
Python PySpark DataFrame colRegex方法用法及代码示例
Python PySpark DataFrame columns属性用法及代码示例
Python Pandas DataFrame copy方法用法及代码示例
Python PySpark DataFrame coalesce方法用法及代码示例
Python Pandas DataFrame corrwith方法用法及代码示例
Python PySpark DataFrame corr方法用法及代码示例
Python Pandas DataFrame convert_dtypes方法用法及代码示例
Python Pandas DataFrame combine方法用法及代码示例
Python PySpark DataFrame cov方法用法及代码示例
Python Pandas DataFrame count方法用法及代码示例
Python PySpark DataFrame count方法用法及代码示例
Python Pandas DataFrame corr方法用法及代码示例
Python Pandas DataFrame combine_first方法用法及代码示例
Python Pandas DataFrame cov方法用法及代码示例
Python Pandas DataFrame clip方法用法及代码示例
Python Pandas DataFrame cummax方法用法及代码示例
Python Pandas DataFrame cumprod方法用法及代码示例
Python Pandas DataFrame cummin方法用法及代码示例
Python Pandas DataFrame cumsum方法用法及代码示例
Python Pandas DataFrame empty属性用法及代码示例
Python Pandas DataFrame pop方法用法及代码示例
Python Pandas DataFrame nsmallest方法用法及代码示例
Python Pandas DataFrame sample方法用法及代码示例
Python Pandas DataFrame items方法用法及代码示例

注：本文由纯净天空筛选整理自Isshin Inada大神的英文原创作品 PySpark DataFrame | collect method。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。