Python Pandas concat方法用法及代碼示例

Pandas concat(~) 方法水平或垂直連接 Series 或 DataFrame 列表。

參數

1.objs | list-like 或 map-like 或 Series 或 DataFrame

array-likes 或 DataFrames 水平或垂直堆疊。

2. axis | int 或 string | optional

是水平連接還是垂直連接：

軸	說明
`0` 或 `"index"`	水平連接。
`1` 或 `"columns"`	垂直連接。

默認情況下，axis=0 。

3. join | string | optional

是否執行內部連接或外部(完全)連接：

"inner" ：執行內連接
"outer"：執行外連接

默認情況下，join="outer" 。

4. ignore_index | boolean | optional

如果 True ，則生成的 DataFrame 的索引將重置為 0,1,...,n-1，其中 n 是 DataFrame 的行數。默認情況下，ignore_index=False 。

5. keys | sequence | optional

用於構造層次索引。默認情況下，keys=None 。

6. levels | list<sequence> | optional

用於構造多重索引的級別。默認情況下，將使用keys。

7. names | list<string> | optional

分配給生成的分層索引中的級別的標簽。默認情況下，names=None 。

8. verify_integrity | boolean | optional

如果是True，那麽如果生成的Series/DataFrame包含重複的索引或列標簽，將會拋出錯誤。該檢查過程可能在計算上是昂貴的。默認情況下，verify_integrity=False 。

9. sort | boolean | optional

是否對非串聯軸進行排序。這僅適用於 join="outer" ，不適用於 join="inner" 。

10.copy | boolean | optional

是否返回新的係列/數據幀或重用提供的objs(如果可能)。默認情況下，copy=True 。

返回值

返回類型取決於以下參數：

當axis=0 和串聯位於Series 之間時，則返回Series。
當串聯涉及至少一個 DataFrame 時，則返回 DataFrame。
當 axis=1 時，返回 DataFrame。

例子

考慮以下數據幀：

df = pd.DataFrame({"A":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"B":[8,9]})



   A  B  |     A  B
0  2  4  |  0  6  8
1  3  5  |  1  7  9

垂直連接多個DataFrames

垂直連接多個DataFrames：

pd.concat([df, df_other])   # axis=0



   A  B
0  2  4
1  3  5
0  6  8
1  7  9

水平連接多個DataFrames

要水平連接多個DataFrames，請傳入axis=1，如下所示：

pd.concat([df, df_other], axis=1)



   A  B  A  B
0  2  4  6  8
1  3  5  7  9

指定連接

考慮以下數據幀：

df = pd.DataFrame({"A":[2],"B":[3]})
df_other = pd.DataFrame({"B":[4],"C":[5]})



   A  B   |     B  C
0  2  3   |  0  4  5

在這裏，DataFrames 都有列 B 。

外連接

默認情況下， join="outer" ，這意味著所有列都將出現在生成的 DataFrame 中，並且具有相同標簽的列將被堆疊：

pd.concat([df,df_other], join="inner")



   A    B  C
0  2.0  3  NaN
0  NaN  4  5.0

我們為某些條目獲取 NaN 的原因是，由於 B 列在 DataFrame 之間共享，因此 B 的值會堆疊在一起，但 A 和 C 列隻有一個值，因此NaN 必須作為填充符插入。

內部聯接

要執行inner-join，請像這樣設置join="inner"：

pd.concat([df,df_other], join="inner")



   B
0  3
0  4

在這裏，隻有出現在所有 DataFrames 中的列才會出現在生成的 DataFrame 中。由於 df 和 df_other 之間僅共享列 B，因此我們在輸出中僅看到列 B。

串聯係列

連接係列的工作原理與連接數據幀相同。

要垂直連接兩個係列：

s1 = pd.Series(['a','b'])
s2 = pd.Series(['c','d'])
pd.concat([s1, s2])         # returns a Series



0    a
1    b
0    c
1    d
dtype: object

水平連接兩個係列：

s1 = pd.Series(['a','b'])
s2 = pd.Series(['c','d'])
pd.concat([s1, s2], axis=1)   # returns a DataFrame



   0  1
0  a  c
1  b  d

指定ignore_index

默認情況下， ignore_index=False ，這意味著將保留輸入的原始索引：

s1 = pd.Series([3,4], index=["a","b"])
s2 = pd.Series([5,6], index=["c","d"])
pd.concat([s1, s2])



a    3
b    4
c    5
d    6
dtype: int64

要將索引重置為默認整數索引：

s1 = pd.Series([3,4], index=["a","b"])
s2 = pd.Series([5,6], index=["c","d"])
pd.concat([s1, s2], ignore_index=True)



0    3
1    4
2    5
3    6
dtype: int64

指定鍵

要形成多重索引，請指定 keys 參數：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=["A","B"])



A  0    a
   1    b
B  0    c
   1    d
dtype: object

要添加更多級別，請傳遞 tuple，如下所示：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=[("A","B"),("C","D")])



A  B  0    a
      1    b
C  D  0    c
      1    d
dtype: object

指定名稱

names 參數用於為結果係列/數據幀的索引分配標簽：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=["A","B"], names=["Groups"])



Groups   
A       0    a
        1    b
B       0    c
        1    d
dtype: object

這裏，標簽"Groups"被分配給係列的索引。

指定verify_integrity

默認情況下， verify_integrity=False ，這意味著允許重複的索引和列標簽：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2])         # verify_integrity=False



0    a
1    b
0    c
1    d
dtype: object

請注意我們的索引 0 和 1 是如何重疊的。

在這種情況下，設置 verify_integrity=True 將引發錯誤：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], verify_integrity=True)



ValueError: Indexes have overlapping values: Int64Index([0, 1], dtype='int64')

如果要確保生成的 Series/DataFrame 具有唯一索引，請考慮設置 ignore_index=True 。

指定排序

默認情況下， sort=False ，這意味著結果列標簽或索引不會被排序：

df = pd.DataFrame({"C":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]})
pd.concat([df, df_other])      # axis=0



   C    B    A    D 
0  2.0  4.0  NaN  NaN
1  3.0  5.0  NaN  NaN
0  NaN  NaN  6.0  8.0
1  NaN  NaN  7.0  9.0

請注意，列不是按列標簽排序的。

當 axis=0 和 sort=True 時，列將按列標簽排序：

df = pd.DataFrame({"C":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]})
pd.concat([df, df_other], sort=True)



   A    B    C    D
0  NaN  4.0  2.0  NaN
1  NaN  5.0  3.0  NaN
0  6.0  NaN  NaN  8.0
1  7.0  NaN  NaN  9.0

當 axis=1 和 sort=True 時，行將按行標簽排序：

df = pd.DataFrame({"C":[2,3],"B":[4,5]}, index=[3,2])
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]}, index=[1,4])
pd.concat([df, df_other], axis=1, sort=True)



   C    B    A    D
1  NaN  NaN  6.0  8.0
2  3.0  5.0  NaN  NaN
3  2.0  4.0  NaN  NaN
4  NaN  NaN  7.0  9.0

相關用法

注：本文由純淨天空篩選整理自Isshin Inada大神的英文原創作品 Pandas | concat method。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。