Python Pandas DataFrame reindex方法用法及代码示例

Pandas DataFrame.reindex(~) 方法为源 DataFrame 设置新索引，并将 NaN 设置为行或列标签为新的值。检查示例以进行澄清。

参数

1.labels | array-like | optional

设置为索引的新标签。使用 axis 指示是否为行或列设置新标签。

2. index | array-like | optional

新的行标签。

3. columns | array-like | optional

新的列标签。

4. axis | int 或 str | optional

是否将标签应用于索引或列：

值	说明
`0` 或 `"index"`	标签将应用于索引(即行标签)
`0` 或 `"columns"`	标签将成为列标签

注意

您可以通过两种方式更改索引或列的标签：

指定 index 和/或 columns
指定labels 和axis

使用参数index 或columns 比使用labels 和axis 更好，因为意图更清晰，语法更短。

5. method | None 或 string | optional

填充缺失值时使用的逻辑：

值	说明
`None`	保留缺失值不变。
`"pad"` 或 `"ffill"`	使用前一行/列的值。
`"backfill"` 或 `"bfill"`	使用下一行/列的下一个值。
`"nearest"`	使用最近的行/列的值。

默认情况下，method=None 。请查看我们的示例以进行澄清。

警告

method参数仅在源DataFrame的行或列标签单调递增或单调递减时生效。

6. copy | boolean | optional

是否创建并返回新的 DataFrame，而不是直接修改源 DataFrame。默认情况下，copy=True 。

7. level | string | optional

目标水平。仅当源 DataFrame 是多索引时，这才相关。

8. fill_value | scalar | optional

用于填充缺失值的值。默认情况下，fill_value=NaN 。

9. limit | int | optional

向前/向后填充的连续缺失值的最大数量。默认情况下，limit=None 。

10.tolerance | scalar 或 list | optional

是否进行填充根据以下标准：

abs(index[indexer] - target) <= tolerance.

指定tolerance而不指定method将导致错误。默认情况下，tolerance=None 。

返回值

行标签或列标签已更新的 DataFrame。

例子

考虑以下 DataFrame ：

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df



   A  B
a  2  4
b  3  5

更改行标签

将索引(即行标签)更改为 "a" 和 "c" ：

df.reindex(index=["a","c"])



   A    B
a  2.0  4.0
c  NaN  NaN

在此，请注意以下事项：

[aA] 和 [aB] 处的值保持原样。这是因为[aA]和[aB]都存在于源DataFrame中。
[cA] 和 [cB] 处的值为 NaN 。这是因为源 DataFrame 中不存在 [cA] 和 [cB]。

更改列标签

这是我们之前的df：

df



   A  B
a  2  4
b  3  5

要设置新的列标签：

df.reindex(columns=["B","D"])



   B  D
b  4  NaN
d  5  NaN

在此，请注意以下事项：

[bB] 和 [dB] 处的值保持原样。这是因为[bB]和[dB]都存在于源DataFrame中。
[Db] 和 [Dd] 处的值为 NaN 。这是因为源 DataFrame 中不存在 [Db] 和 [Dd]。

指定方法

考虑以下 DataFrame ：

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["b","d"])
df



   A  B
b  2  4
d  3  5

None

默认情况下， method=None ，这意味着不会执行填充，因此具有新行标签或列标签的值将为 NaN ：

df.reindex(index=["a","c"])



   A    B
a  NaN  NaN
c  NaN  NaN

填充

要使用以前的值进行填充，请传入 method="ffill"，如下所示：

df.reindex(index=["a","c"], method="ffill")



   A    B
a  NaN  NaN
c  2.0  4.0

在此，请注意以下事项：

我们仍然有索引 "a" 的 NaN ，因为没有索引小于 "a" ，即源 DataFrame 包含索引 "b" 和 "d" ，它们都大于 "a" 。
索引"c"中的值使用源DataFrame的索引"b"中的值填充。这是因为索引 "b" 是小于索引 "c" 的最后一个索引。

填充

作为参考，这里再次df：

df



   A  B
b  2  4
d  3  5

要使用下一个值进行填充，请传入 method="bfill"，如下所示：

df.reindex(index=["a","c"], method="bfill")



   A  B
a  2  4
c  3  5

在此，请注意以下事项：

行 "a" 中的值填充为源 DataFrame 行 "b" 中的值。这是因为大于索引 "a" 的下一个索引是索引 "b" 。
完全相同的推理适用于索引 "c" 的填充方式。

指定公差

考虑以下 DataFrame ：

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[3,6])
df



   A  B
3  2  4
6  3  5

假设我们想设置一个带有前向填充的新索引[5,7]。我们可以指定tolerance来指示前向填充是否应该生效：

df.reindex(index=[5,7], method="ffill", tolerance=1)



   A    B
5  NaN  NaN
7  3.0  5.0

在此，请注意以下事项：

索引为 5 的行有 NaN 。这是因为 abs(3-5)=2 大于指定的 tolerance 。
索引为 7 的行已使用源 DataFrame 的索引 6 为 forward-filled。这是因为 abs(6-7)=1 小于或等于指定的 tolerance 。

相关用法

注：本文由纯净天空筛选整理自Isshin Inada大神的英文原创作品 Pandas DataFrame | reindex method。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

参数

注意

警告

返回值

例子

更改行标签

更改列标签

指定方法

None

填充

填充

最近的

指定公差