Python Pandas merge_ordered方法用法及代碼示例

Pandas merge_ordered(~) 方法將兩個 DataFrames 連接起來，並可以選擇執行填充或插值。

參數

1. left | DataFrame

左側DataFrame 執行連接。

2. right | DataFrame

用於執行連接的右側DataFrame。

3. on | string 或 list

要加入的列的標簽。

注意

on 參數隻是為了方便起見。如果要連接的列具有不同的標簽，則必須指定 left_on 和 right_on 。

4. left_on | string 或 array-like

left 中要執行連接的列的標簽。

5. right_on | string 或 array-like

right 中要執行連接的列的標簽。

6. left_by | string 或 list<string>

left 到 "expand" 中的列的標簽。請參閱下麵的示例以進行說明。

7. right_by | string 或 list<string>

right 到 "expand" 中的列的標簽。請參閱下麵的示例以進行說明。

8. fill_method | string 或 None | optional

如何在合並的DataFrame中填充NaN：

值	說明
`"ffill"`	使用之前的非`NaN`值來填充。
`None`	保留 `NaN` 不變。

默認情況下，fill_method=None 。

9. suffixes | (string, string) 的tuple | optional

要附加到生成的 DataFrame 中的重複列標簽的後綴名稱。您還可以傳遞單個 None 而不是 suffixes 中的字符串，以指示左列或右列標簽應保持原樣。默認情況下，suffixes=("_x", "_y") 。

10.how | string | optional

要執行的連接類型：

值	說明
`"left"`	源 DataFrame 中的所有行都將出現在生成的 DataFrame 中。這相當於 left-join 的 SQL。
`"right"`	右側 DataFrame 中的所有行都將出現在生成的 DataFrame 中。這相當於 right-join 的 SQL。
`"outer"`	來自源和右側 DataFrame 的所有行都將出現在生成的 DataFrame 中。這相當於 outer-join 的 SQL。
`"inner"`	在源 DataFrame 中具有匹配值的所有行都將出現在生成的 DataFrame 中。這是相當於 inner-join 的 SQL。

默認情況下，how="outer" 。

這是說明差異的經典維恩圖：

返回值

合並的 DataFrame 。

例子

基本用法

考慮一家擁有一些有關其產品和客戶的數據的商店：

df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
 "bought_by": ["bob", "alex", "david"]},
 index=["A","B","C"])
df_customers = pd.DataFrame({"name":["alex","bob","cathy"], "age":[10, 20, 30]})



        [df_products]         |   [df_customers]
   product      bought_by     |        name   age
A  computer       bob         |     0  alex   10
B  smartphone     alex        |     1  bob    20
C  headphones     david       |     2  cathy  30

要基於列 bought_by 和 name 對兩個 DataFrames 執行外連接：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")



   product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

指定fill_method

與 merge(~) 不同，merge_ordered(~) 允許填充由於連接而出現的缺失值。

再次考慮與上麵相同的DataFrames：

[df_products]         |   [df_customers]
   product        bought_by   |       name   age
A  computer       bob         |    0  alex   10
B  smartphone     alex        |    1  bob    20
C  headphones     david       |    2  cathy  30

默認情況下， fill_method=None ，這意味著生成的 NaN 保持原樣：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")



   product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

要填充這些 NaN ，請像這樣設置 fill_method="ffill" ：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", fill_method="ffill")



   product     bought_by  name   age
0  smartphone  alex       alex   10
1  computer    bob        bob    20
2  computer    bob        cathy  30
3  headphones  david      cathy  30

請注意所有NaN 是如何用之前的非NaN 值填充的。

警告

請注意，這個例子是隻是為了說明填充的工作原理- 我們絕不會進行此類填充。這種填充邏輯的實際用例主要是為時間序列保留的，因為用先前記錄的日期時間填充更有意義。

指定left_by

讓我們使用與上麵相同的示例：

[df_products]         |   [df_customers]
   product      bought_by     |        name   age
A  computer       bob         |     0  alex   10
B  smartphone     alex        |     1  bob    20
C  headphones     david       |     2  cathy  30

默認情況下， left_by=None ，這意味著生成的 DataFrame 是使用傳統連接構造的：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")



   product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

設置 left_by="product" 將為連接鍵 ( bought_by ) 中的每一行重複每個產品項：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", left_by="product")



   product    bought_by  age   name
0  computer    NaN       10.0  alex
1  computer    bob       20.0  bob
2  computer    NaN       30.0  cathy
3  smartphone  alex      10.0  alex
4  smartphone  NaN       20.0  bob
5  smartphone  NaN       30.0  cathy
6  headphones  NaN       10.0  alex
7  headphones  NaN       20.0  bob
8  headphones  NaN       30.0  cathy
9  headphones  david     NaN   NaN

指定後綴

考慮以下數據幀：

df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
 "age": [7,8,9],
 "bought_by": ["bob", "alex", "bob"]},
 index=["A","B","C"])
df_customers = pd.DataFrame({"name":["alex","bob","cathy"], "age":[10, 20, 30]})



        [df_products]            |   [df_customers]
   product      age  bought_by   |        name   age
A  computer      7   bob         |     0  alex   10
B  smartphone    8   alex        |     1  bob    20
C  headphones    9   david       |     2  cathy  30

請注意兩個 DataFrames 如何具有重疊的列標簽 - age 。

默認情況下， suffixes=("_x","_y") ，這意味著如果合並的 DataFrame 具有重疊的列標簽，則後綴 "_x" 將附加到左側 DataFrame 的重疊列標簽上，並將 "_y" 附加到右側 DataFrame 上：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")



   product      age_x  bought_by  name   age_y
0  smartphone   8.0    alex       alex    10
1  computer     7.0    bob        bob     20
...

我們可以像這樣指定我們自己的後綴：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", suffixes=["_A","_B"])



   product     age_A  bought_by  name  age_B
0  smartphone  8.0    alex       alex  10
1  computer    7.0    bob        bob   20
...

您還可以傳遞 None 而不是字符串來指示左列或右列標簽應保持原樣：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", suffixes=[None,"_B"])



   product     age  bought_by  name  age_B
0  smartphone  8.0  alex       alex  10
1  computer    7.0  bob        bob   20
...

相關用法

注：本文由純淨天空篩選整理自Isshin Inada大神的英文原創作品 Pandas | merge_ordered method。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。