Python Pandas concat方法用法及代码示例

Pandas concat(~) 方法水平或垂直连接 Series 或 DataFrame 列表。

参数

1.objs | list-like 或 map-like 或 Series 或 DataFrame

array-likes 或 DataFrames 水平或垂直堆叠。

2. axis | int 或 string | optional

是水平连接还是垂直连接：

轴	说明
`0` 或 `"index"`	水平连接。
`1` 或 `"columns"`	垂直连接。

默认情况下，axis=0 。

3. join | string | optional

是否执行内部连接或外部(完全)连接：

"inner" ：执行内连接
"outer"：执行外连接

默认情况下，join="outer" 。

4. ignore_index | boolean | optional

如果 True ，则生成的 DataFrame 的索引将重置为 0,1,...,n-1，其中 n 是 DataFrame 的行数。默认情况下，ignore_index=False 。

5. keys | sequence | optional

用于构造层次索引。默认情况下，keys=None 。

6. levels | list<sequence> | optional

用于构造多重索引的级别。默认情况下，将使用keys。

7. names | list<string> | optional

分配给生成的分层索引中的级别的标签。默认情况下，names=None 。

8. verify_integrity | boolean | optional

如果是True，那么如果生成的Series/DataFrame包含重复的索引或列标签，将会抛出错误。该检查过程可能在计算上是昂贵的。默认情况下，verify_integrity=False 。

9. sort | boolean | optional

是否对非串联轴进行排序。这仅适用于 join="outer" ，不适用于 join="inner" 。

10.copy | boolean | optional

是否返回新的系列/数据帧或重用提供的objs(如果可能)。默认情况下，copy=True 。

返回值

返回类型取决于以下参数：

当axis=0 和串联位于Series 之间时，则返回Series。
当串联涉及至少一个 DataFrame 时，则返回 DataFrame。
当 axis=1 时，返回 DataFrame。

例子

考虑以下数据帧：

df = pd.DataFrame({"A":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"B":[8,9]})



   A  B  |     A  B
0  2  4  |  0  6  8
1  3  5  |  1  7  9

垂直连接多个DataFrames

垂直连接多个DataFrames：

pd.concat([df, df_other])   # axis=0



   A  B
0  2  4
1  3  5
0  6  8
1  7  9

水平连接多个DataFrames

要水平连接多个DataFrames，请传入axis=1，如下所示：

pd.concat([df, df_other], axis=1)



   A  B  A  B
0  2  4  6  8
1  3  5  7  9

指定连接

考虑以下数据帧：

df = pd.DataFrame({"A":[2],"B":[3]})
df_other = pd.DataFrame({"B":[4],"C":[5]})



   A  B   |     B  C
0  2  3   |  0  4  5

在这里，DataFrames 都有列 B 。

外连接

默认情况下， join="outer" ，这意味着所有列都将出现在生成的 DataFrame 中，并且具有相同标签的列将被堆叠：

pd.concat([df,df_other], join="inner")



   A    B  C
0  2.0  3  NaN
0  NaN  4  5.0

我们为某些条目获取 NaN 的原因是，由于 B 列在 DataFrame 之间共享，因此 B 的值会堆叠在一起，但 A 和 C 列只有一个值，因此NaN 必须作为填充符插入。

内部联接

要执行inner-join，请像这样设置join="inner"：

pd.concat([df,df_other], join="inner")



   B
0  3
0  4

在这里，只有出现在所有 DataFrames 中的列才会出现在生成的 DataFrame 中。由于 df 和 df_other 之间仅共享列 B，因此我们在输出中仅看到列 B。

串联系列

连接系列的工作原理与连接数据帧相同。

要垂直连接两个系列：

s1 = pd.Series(['a','b'])
s2 = pd.Series(['c','d'])
pd.concat([s1, s2])         # returns a Series



0    a
1    b
0    c
1    d
dtype: object

水平连接两个系列：

s1 = pd.Series(['a','b'])
s2 = pd.Series(['c','d'])
pd.concat([s1, s2], axis=1)   # returns a DataFrame



   0  1
0  a  c
1  b  d

指定ignore_index

默认情况下， ignore_index=False ，这意味着将保留输入的原始索引：

s1 = pd.Series([3,4], index=["a","b"])
s2 = pd.Series([5,6], index=["c","d"])
pd.concat([s1, s2])



a    3
b    4
c    5
d    6
dtype: int64

要将索引重置为默认整数索引：

s1 = pd.Series([3,4], index=["a","b"])
s2 = pd.Series([5,6], index=["c","d"])
pd.concat([s1, s2], ignore_index=True)



0    3
1    4
2    5
3    6
dtype: int64

指定键

要形成多重索引，请指定 keys 参数：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=["A","B"])



A  0    a
   1    b
B  0    c
   1    d
dtype: object

要添加更多级别，请传递 tuple，如下所示：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=[("A","B"),("C","D")])



A  B  0    a
      1    b
C  D  0    c
      1    d
dtype: object

指定名称

names 参数用于为结果系列/数据帧的索引分配标签：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], keys=["A","B"], names=["Groups"])



Groups   
A       0    a
        1    b
B       0    c
        1    d
dtype: object

这里，标签"Groups"被分配给系列的索引。

指定verify_integrity

默认情况下， verify_integrity=False ，这意味着允许重复的索引和列标签：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2])         # verify_integrity=False



0    a
1    b
0    c
1    d
dtype: object

请注意我们的索引 0 和 1 是如何重叠的。

在这种情况下，设置 verify_integrity=True 将引发错误：

s1 = pd.Series(["a","b"])
s2 = pd.Series(["c","d"])
pd.concat([s1, s2], verify_integrity=True)



ValueError: Indexes have overlapping values: Int64Index([0, 1], dtype='int64')

如果要确保生成的 Series/DataFrame 具有唯一索引，请考虑设置 ignore_index=True 。

指定排序

默认情况下， sort=False ，这意味着结果列标签或索引不会被排序：

df = pd.DataFrame({"C":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]})
pd.concat([df, df_other])      # axis=0



   C    B    A    D 
0  2.0  4.0  NaN  NaN
1  3.0  5.0  NaN  NaN
0  NaN  NaN  6.0  8.0
1  NaN  NaN  7.0  9.0

请注意，列不是按列标签排序的。

当 axis=0 和 sort=True 时，列将按列标签排序：

df = pd.DataFrame({"C":[2,3],"B":[4,5]})
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]})
pd.concat([df, df_other], sort=True)



   A    B    C    D
0  NaN  4.0  2.0  NaN
1  NaN  5.0  3.0  NaN
0  6.0  NaN  NaN  8.0
1  7.0  NaN  NaN  9.0

当 axis=1 和 sort=True 时，行将按行标签排序：

df = pd.DataFrame({"C":[2,3],"B":[4,5]}, index=[3,2])
df_other = pd.DataFrame({"A":[6,7],"D":[8,9]}, index=[1,4])
pd.concat([df, df_other], axis=1, sort=True)



   C    B    A    D
1  NaN  NaN  6.0  8.0
2  3.0  5.0  NaN  NaN
3  2.0  4.0  NaN  NaN
4  NaN  NaN  7.0  9.0

相关用法

注：本文由纯净天空筛选整理自Isshin Inada大神的英文原创作品 Pandas | concat method。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。