Python Pandas merge_ordered方法用法及代码示例

Pandas merge_ordered(~) 方法将两个 DataFrames 连接起来，并可以选择执行填充或插值。

参数

1. left | DataFrame

左侧DataFrame 执行连接。

2. right | DataFrame

用于执行连接的右侧DataFrame。

3. on | string 或 list

要加入的列的标签。

注意

on 参数只是为了方便起见。如果要连接的列具有不同的标签，则必须指定 left_on 和 right_on 。

4. left_on | string 或 array-like

left 中要执行连接的列的标签。

5. right_on | string 或 array-like

right 中要执行连接的列的标签。

6. left_by | string 或 list<string>

left 到 "expand" 中的列的标签。请参阅下面的示例以进行说明。

7. right_by | string 或 list<string>

right 到 "expand" 中的列的标签。请参阅下面的示例以进行说明。

8. fill_method | string 或 None | optional

如何在合并的DataFrame中填充NaN：

值	说明
`"ffill"`	使用之前的非`NaN`值来填充。
`None`	保留 `NaN` 不变。

默认情况下，fill_method=None 。

9. suffixes | (string, string) 的tuple | optional

要附加到生成的 DataFrame 中的重复列标签的后缀名称。您还可以传递单个 None 而不是 suffixes 中的字符串，以指示左列或右列标签应保持原样。默认情况下，suffixes=("_x", "_y") 。

10.how | string | optional

要执行的连接类型：

值	说明
`"left"`	源 DataFrame 中的所有行都将出现在生成的 DataFrame 中。这相当于 left-join 的 SQL。
`"right"`	右侧 DataFrame 中的所有行都将出现在生成的 DataFrame 中。这相当于 right-join 的 SQL。
`"outer"`	来自源和右侧 DataFrame 的所有行都将出现在生成的 DataFrame 中。这相当于 outer-join 的 SQL。
`"inner"`	在源 DataFrame 中具有匹配值的所有行都将出现在生成的 DataFrame 中。这是相当于 inner-join 的 SQL。

默认情况下，how="outer" 。

这是说明差异的经典维恩图：

返回值

合并的 DataFrame 。

例子

基本用法

考虑一家拥有一些有关其产品和客户的数据的商店：

df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
 "bought_by": ["bob", "alex", "david"]},
 index=["A","B","C"])
df_customers = pd.DataFrame({"name":["alex","bob","cathy"], "age":[10, 20, 30]})



        [df_products]         |   [df_customers]
   product      bought_by     |        name   age
A  computer       bob         |     0  alex   10
B  smartphone     alex        |     1  bob    20
C  headphones     david       |     2  cathy  30

要基于列 bought_by 和 name 对两个 DataFrames 执行外连接：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")



   product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

指定fill_method

与 merge(~) 不同，merge_ordered(~) 允许填充由于连接而出现的缺失值。

再次考虑与上面相同的DataFrames：

[df_products]         |   [df_customers]
   product        bought_by   |       name   age
A  computer       bob         |    0  alex   10
B  smartphone     alex        |    1  bob    20
C  headphones     david       |    2  cathy  30

默认情况下， fill_method=None ，这意味着生成的 NaN 保持原样：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")



   product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

要填充这些 NaN ，请像这样设置 fill_method="ffill" ：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", fill_method="ffill")



   product     bought_by  name   age
0  smartphone  alex       alex   10
1  computer    bob        bob    20
2  computer    bob        cathy  30
3  headphones  david      cathy  30

请注意所有NaN 是如何用之前的非NaN 值填充的。

警告

请注意，这个例子是只是为了说明填充的工作原理- 我们绝不会进行此类填充。这种填充逻辑的实际用例主要是为时间序列保留的，因为用先前记录的日期时间填充更有意义。

指定left_by

让我们使用与上面相同的示例：

[df_products]         |   [df_customers]
   product      bought_by     |        name   age
A  computer       bob         |     0  alex   10
B  smartphone     alex        |     1  bob    20
C  headphones     david       |     2  cathy  30

默认情况下， left_by=None ，这意味着生成的 DataFrame 是使用传统连接构造的：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")



   product     bought_by  name   age
0  smartphone  alex       alex   10.0
1  computer    bob        bob    20.0
2  NaN         NaN        cathy  30.0
3  headphones  david      NaN    NaN

设置 left_by="product" 将为连接键 ( bought_by ) 中的每一行重复每个产品项：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", left_by="product")



   product    bought_by  age   name
0  computer    NaN       10.0  alex
1  computer    bob       20.0  bob
2  computer    NaN       30.0  cathy
3  smartphone  alex      10.0  alex
4  smartphone  NaN       20.0  bob
5  smartphone  NaN       30.0  cathy
6  headphones  NaN       10.0  alex
7  headphones  NaN       20.0  bob
8  headphones  NaN       30.0  cathy
9  headphones  david     NaN   NaN

指定后缀

考虑以下数据帧：

df_products = pd.DataFrame({"product": ["computer", "smartphone", "headphones"],
 "age": [7,8,9],
 "bought_by": ["bob", "alex", "bob"]},
 index=["A","B","C"])
df_customers = pd.DataFrame({"name":["alex","bob","cathy"], "age":[10, 20, 30]})



        [df_products]            |   [df_customers]
   product      age  bought_by   |        name   age
A  computer      7   bob         |     0  alex   10
B  smartphone    8   alex        |     1  bob    20
C  headphones    9   david       |     2  cathy  30

请注意两个 DataFrames 如何具有重叠的列标签 - age 。

默认情况下， suffixes=("_x","_y") ，这意味着如果合并的 DataFrame 具有重叠的列标签，则后缀 "_x" 将附加到左侧 DataFrame 的重叠列标签上，并将 "_y" 附加到右侧 DataFrame 上：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer")



   product      age_x  bought_by  name   age_y
0  smartphone   8.0    alex       alex    10
1  computer     7.0    bob        bob     20
...

我们可以像这样指定我们自己的后缀：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", suffixes=["_A","_B"])



   product     age_A  bought_by  name  age_B
0  smartphone  8.0    alex       alex  10
1  computer    7.0    bob        bob   20
...

您还可以传递 None 而不是字符串来指示左列或右列标签应保持原样：

pd.merge_ordered(df_products, df_customers, left_on="bought_by", right_on="name", how="outer", suffixes=[None,"_B"])



   product     age  bought_by  name  age_B
0  smartphone  8.0  alex       alex  10
1  computer    7.0  bob        bob   20
...

相关用法

注：本文由纯净天空筛选整理自Isshin Inada大神的英文原创作品 Pandas | merge_ordered method。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。