Python Pandas DataFrame set_index方法用法及代码示例

Pandas 的 DataFrame.set_index(~) 使用其列之一设置 DataFrame 的索引。

1.keys | string 或 array-like 或 list<string>

用于索引的列的名称。

2. drop | boolean | optional

默认情况下，drop=True 。

3. append | boolean | optional

默认情况下，append=False 。

4. inplace | boolean | optional

默认情况下，inplace=False 。

5. verify_integrity | boolean | optional

默认情况下，verify_integrity=False 。

具有新索引的DataFrame。

考虑以下 DataFrame ：

df = pd.DataFrame({"A":[1,2], "B":[3,4], "C":[5,6]})
df



   A  B  C
0  1  3  5
1  2  4  6

要将 A 列设置为 df 的索引：

df.set_index("A")      # Returns a DataFrame



   B  C
A		
1  3  5
2  4  6

这里，分配给索引的名称是列标签，即 "A" 。

要将列 A 和 B 设置为 df 的索引：

df.set_index(["A","B"])



      C
A  B	
1  3  5
2  4  6

这里，DataFrame 最终有 2 个索引。

要保留将用作索引的列，请设置 drop=False ：

df.set_index("A", drop=False)



   A  B  C
A			
1  1  3  5
2  2  4  6

请注意 A 列仍然存在。

作为参考，这里再次df：

要将列附加到现有索引，请设置 append=True ：

df.set_index("A", append=True)



      B  C
   A		
0  1  3  5
1  2  4  6

请注意原始索引 [0,1] 是如何附加的。

要就地设置索引，请提供 inplace=True ：

df.set_index("A", inplace=True)
df



   B
A	
1  3
2  4

如上面的输出所示，通过设置inplace=True，将直接修改源DataFrame。当您确定不需要源 DataFrame 时，请选择设置 inplace=True，因为这将节省内存。

考虑以下 DataFrame ：

df = pd.DataFrame({"A":[1,1],"B":[3,4]})
df



   A  B
0  1  3
1  1  4

默认情况下， verify_integrity=False ，这意味着如果结果索引包含重复项，则不会引发错误：

df.set_index("A")   # verify_integrity=False



   B
A   
1  3
1  4

请注意新索引如何包含重复值(两个 1 )，但没有引发错误。

要在这种情况下抛出错误，请像这样传递verify_integrity=True：

df.set_index("A", verify_integrity=True)



ValueError: Index has duplicate keys: Int64Index([1], dtype='int64', name='A')

相关用法

注：本文由纯净天空筛选整理自Isshin Inada大神的英文原创作品 Pandas DataFrame | set_index method。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。