在本文中,我們將了解如何將 PySpark DataFrame 轉換為字典,其中鍵是列名稱,值是列值。
在開始之前,我們將創建一個示例 DataFrame :
Python3
# Importing necessary libraries
from pyspark.sql import SparkSession
# Create a spark session
spark = SparkSession.builder.appName('DF_to_dict').getOrCreate()
# Create data in dataframe
data = [(('Ram'), '1991-04-01', 'M', 3000),
(('Mike'), '2000-05-19', 'M', 4000),
(('Rohini'), '1978-09-05', 'M', 4000),
(('Maria'), '1967-12-01', 'F', 4000),
(('Jenis'), '1980-02-17', 'F', 1200)]
# Column names in dataframe
columns = ["Name", "DOB", "Gender", "salary"]
# Create the spark dataframe
df = spark.createDataFrame(data=data,
schema=columns)
# Print the dataframe
df.show()
輸出:
方法1:使用df.toPandas()
使用 df.toPandas() 將 PySpark 數據幀轉換為 Pandas 數據幀。
用法:DataFrame.toPandas()
返回類型:返回與 Pyspark Dataframe 內容相同的 pandas DataFrame 。
獲取每個列值並將值列表添加到以列名作為鍵的字典中。
Python3
# Declare an empty Dictionary
dict = {}
# Convert PySpark DataFrame to Pandas
# DataFrame
df = df.toPandas()
# Traverse through each column
for column in df.columns:
# Add key as column_name and
# value as list of column values
dict[column] = df[column].values.tolist()
# Print the dictionary
print(dict)
輸出:
{‘Name’: [‘Ram’, ‘Mike’, ‘Rohini’, ‘Maria’, ‘Jenis’],
‘DOB’: [‘1991-04-01’, ‘2000-05-19’, ‘1978-09-05’, ‘1967-12-01’, ‘1980-02-17’],
‘Gender’: [‘M’, ‘M’, ‘M’, ‘F’, ‘F’],
‘salary’: [3000, 4000, 4000, 4000, 1200]}
方法2:使用df.collect()
將PySpark DataFrame 轉換為行列表,並以列表形式返回 DataFrame 的所有記錄。
用法:DataFrame.collect()
返回類型:以行列表的形式返回 DataFrame 的所有記錄。
Python3
import numpy as np
# Convert the dataframe into list
# of rows
rows = [list(row) for row in df.collect()]
# COnvert the list into numpy array
ar = np.array(rows)
# Declare an empty dictionary
dict = {}
# Get through each column
for i, column in enumerate(df.columns):
# Add ith column as values in dict
# with key as ith column_name
dict[column] = list(ar[:, i])
# Print the dictionary
print(dict)
輸出:
{‘Name’: [‘Ram’, ‘Mike’, ‘Rohini’, ‘Maria’, ‘Jenis’],
‘DOB’: [‘1991-04-01’, ‘2000-05-19’, ‘1978-09-05’, ‘1967-12-01’, ‘1980-02-17’],
‘Gender’: [‘M’, ‘M’, ‘M’, ‘F’, ‘F’],
‘salary’: [‘3000’, ‘4000’, ‘4000’, ‘4000’, ‘1200’]}
方法3:使用pandas.DataFrame.to_dict()
Pandas DataFrame 可以使用to_dict()方法直接轉換為字典
用法:DataFrame.to_dict(orient=’dict’,)
參數:
- orient: 指示字典值的類型。它采用諸如 {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’} 之類的值
Return type: 返回字典 對應 DataFrame 。
代碼:
Python3
# COnvert PySpark dataframe to pandas
# dataframe
df = df.toPandas()
# Convert the dataframe into
# dictionary
dict = df.to_dict(orient = 'list')
# Print the dictionary
print(dict)
輸出:
{‘Name’: [‘Ram’, ‘Mike’, ‘Rohini’, ‘Maria’, ‘Jenis’],
‘DOB’: [‘1991-04-01’, ‘2000-05-19’, ‘1978-09-05’, ‘1967-12-01’, ‘1980-02-17’],
‘Gender’: [‘M’, ‘M’, ‘M’, ‘F’, ‘F’],
‘salary’: [3000, 4000, 4000, 4000, 1200]}
將具有 2 列的 DataFrame 轉換為字典,創建一個具有 2 列的 DataFrame ,命名為“Location”和“House_price”
Python3
# Importing necessary libraries
from pyspark.sql import SparkSession
# Create a spark session
spark = SparkSession.builder.appName('DF_to_dict').getOrCreate()
# Create data in dataframe
data = [(('Hyderabad'), 120000),
(('Delhi'), 124000),
(('Mumbai'), 344000),
(('Guntur'), 454000),
(('Bandra'), 111200)]
# Column names in dataframe
columns = ["Location", 'House_price']
# Create the spark dataframe
df = spark.createDataFrame(data=data, schema=columns)
# Print the dataframe
print('Dataframe : ')
df.show()
# COnvert PySpark dataframe to
# pandas dataframe
df = df.toPandas()
# Convert the dataframe into
# dictionary
dict = df.to_dict(orient='list')
# Print the dictionary
print('Dictionary :')
print(dict)
輸出:
相關用法
- Python PySpark DataFrame tail方法用法及代碼示例
- Python PySpark SQL Functions translate方法用法及代碼示例
- Python PySpark SQL Functions lit方法用法及代碼示例
- Python PySpark RDD glom方法用法及代碼示例
- Python PySpark DataFrame withColumn方法用法及代碼示例
- Python PySpark DataFrame alias方法用法及代碼示例
- Python PySpark SQL Functions explode方法用法及代碼示例
- Python PySpark DataFrame toPandas方法用法及代碼示例
- Python PySpark DataFrame transform方法用法及代碼示例
- Python PySpark SQL Functions col方法用法及代碼示例
- Python PySpark DataFrame replace方法用法及代碼示例
- Python PySpark SQL Functions instr方法用法及代碼示例
- Python PySpark RDD partitionBy方法用法及代碼示例
- Python PySpark DataFrame selectExpr方法用法及代碼示例
- Python PySpark Column startswith方法用法及代碼示例
- Python PySpark SQL Functions mean方法用法及代碼示例
- Python PySpark DataFrame head方法用法及代碼示例
- Python PySpark DataFrame join方法用法及代碼示例
- Python PySpark Column withField方法用法及代碼示例
- Python PySpark DataFrame union方法用法及代碼示例
- Python PySpark DataFrame dtypes屬性用法及代碼示例
- Python PySpark SQL Functions round方法用法及代碼示例
- Python PySpark RDD count方法用法及代碼示例
- Python PySpark SQL Functions length方法用法及代碼示例
- Python PySpark DataFrame sort方法用法及代碼示例
注:本文由純淨天空篩選整理自ManikantaBandla大神的英文原創作品 Convert PySpark DataFrame to Dictionary in Python。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。