Python merge_asof方法用法及代碼示例

Pandas merge_asof(~) 方法用於對兩個 DataFrames 執行左連接，其中連接鍵不是通過相等而是通過鄰近來匹配。

警告

左DataFrames 和右DataFrames 都必須按連接鍵排序。

參數

1. left | DataFrame

左側DataFrame 執行連接。

2. right | DataFrame

用於執行連接的右側DataFrame。

3. on | string

要加入的列的標簽。該標簽必須同時出現在 left 和 right 中。

注意

on 參數隻是為了方便起見。如果要連接的列具有不同的標簽，則必須使用 left_on 、 right_on 、 left_index 或 right_index 的組合。

4. left_on | string 或 array-like

left 中要執行連接的列的標簽。

5. right_on | string 或 array-like

right 中要執行連接的列的標簽。

6. left_index | boolean | optional

是否對左側 DataFrame 的索引執行連接。默認情況下，left_index=False 。

7. right_index | boolean | optional

是否對右側 DataFrame 的索引執行連接。默認情況下，right_index=False 。

注意

許多教科書和文檔都使用這些詞合並鍵或者連接鍵表示執行連接的列。

8. by | string 或 list<string> | optional

必須另外匹配才能使連接生效的列的標簽。就像 on 一樣，by 必須同時出現在 left 和 right 中。

9. left_by | string | optional

left 中要執行附加匹配的列的標簽。請參閱下麵的示例以進行說明。

10.right_by | string |可選的

right 中要執行附加匹配的列的標簽。請參閱下麵的示例以進行說明。

警告

如果指定了left_by，則也必須指定right_by，反之亦然。

11.suffixes | (string, string) 的tuple | optional

要附加到生成的 DataFrame 中的重複列標簽的後綴名稱。您還可以傳遞單個 None 而不是 suffixes 中的字符串，以指示左列或右列標簽應保持原樣。默認情況下，suffixes=("_x", "_y") 。

12.tolerance | int 或 Timedelta | optional

一對連接鍵之間可接受的最大差異。默認情況下，tolerance=None 。

13.allow_exact_matches | boolean | optional

是否允許一對連接鍵之間的精確匹配。默認情況下，allow_exact_matches=True 。

14.direction | string | optional

鄰近匹配的方向：

值	說明
`"backward"`	如果左連接鍵小於(或等於)右連接鍵，則匹配。
`"forward"`	如果左連接鍵大於(或等於)右連接鍵則匹配，
`"nearest"`	無論兩個鍵的相對大小如何，都會匹配。

默認情況下，direction="backward" 。請注意，(或等於)部分取決於 allow_exact_matches 。

返回值

合並的 DataFrame 。

例子

基本用法

考慮以下兩個 DataFrame：

df = pd.DataFrame({"A":[2,3,4],"B":[3,5,9]}, index=["a","b","c"])
df_other = pd.DataFrame({"B":[2,5,10],"C":[7,8,1]}, index=["d","e","f"])



   A  B   |     B   C
a  2  3   |  d  2   7
b  3  5   |  e  5   8
c  4  9   |  f  10  1

對列 B 執行連接：

pd.merge_asof(df, df_other, on="B")



   A  B  C
0  2  3  7
1  3  5  8
2  4  9  8

請注意以下事項：

左側 DataFrame 的原始列 B 出現在生成的 DataFrame 中。
一對列 B 中的值不完全匹配 - [3,5,9] 和 [2,5,10] 。
由於值 3 不存在於 right 連接鍵中，因此該方法會查找小於 3 ( direction="backward" ) 的最接近值，在本例中為 2 。 C 列的對應值是 7 ，因此這就是我們在第一行看到值 7 的原因。
同樣，右連接鍵中不存在值 9 ，因此小於 9 的最接近匹配是 5 。此匹配的 C 列的相應值是 8 ，因此我們最終在右下角條目中得到 8 。
on="B" 實際上可以在這裏省略，因為如果 left 和 right 中存在一對重疊的列標簽，該方法將推斷連接鍵。

指定left_by

考慮以下數據幀：

df = pd.DataFrame({"A":[7,9],"B":[3,5]}, index=["a","b"])
df_other = pd.DataFrame({"B":[2,5],"C":[7,8]}, index=["d","e"])



   A  B    |     B  C
a  7  3    |  d  2  7
b  9  5    |  e  5  8

默認情況下，不指定 by 、 left_by 或 right_by ：

pd.merge_asof(df, df_other, on="B")



   A  B  C
0  7  3  7
1  9  5  8

我們可以通過允許指定列值匹配的匹配來限製匹配：

pd.merge_asof(df, df_other, on="B", left_by="A", right_by="C")



   A  B  C
0  7  3  7.0
1  9  5  NaN

請注意以下事項：

第一行仍返回 7.0 值，因為 A 的列值與 C 的列值匹配(兩者都是 7 )。
我們在第二行得到NaN，因為A (9) 的列值與C (8) 的列值不匹配。

指定後綴

考慮以下兩個 DataFrame：

df = pd.DataFrame({"A":[2,3],"B":[3,5]}, index=["a","b"])
df_other = pd.DataFrame({"B":[2,5],"A":[7,8]}, index=["d","e"])



   A  B   |     B  A
a  2  3   |  d  2  7
b  3  5   |  e  5  8

默認情況下， suffixes=("_x","_y") ，這意味著如果生成的 DataFrame 中出現重複的列標簽，則 "_x" 將作為後綴附加到左側的重疊標簽，以及右側的 "_y" ：

pd.merge_asof(df, df_other, on="B")



   A_x  B  A_y
0   2   3   7
1   3   5   8

我們可以像這樣指定我們自己的後綴：

pd.merge_asof(df, df_other, on="B", suffixes=["_X","_Y"])



   A_X  B  A_Y
0   2   3   7
1   3   5   8

您可以傳入單個 None 而不是字符串，以保留左側或右側重疊標簽的原始名稱：

pd.merge_asof(df_one, df_two, on="B", suffixes=["_X",None])



   A_X  B   A
0   2   3   7
1   3   5   8
2   4   9   8

相關用法

注：本文由純淨天空篩選整理自Isshin Inada大神的英文原創作品 Python | merge_asof method。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。