Python Pandas DataFrame reindex方法用法及代碼示例

Pandas DataFrame.reindex(~) 方法為源 DataFrame 設置新索引，並將 NaN 設置為行或列標簽為新的值。檢查示例以進行澄清。

參數

1.labels | array-like | optional

設置為索引的新標簽。使用 axis 指示是否為行或列設置新標簽。

2. index | array-like | optional

新的行標簽。

3. columns | array-like | optional

新的列標簽。

4. axis | int 或 str | optional

是否將標簽應用於索引或列：

值	說明
`0` 或 `"index"`	標簽將應用於索引(即行標簽)
`0` 或 `"columns"`	標簽將成為列標簽

注意

您可以通過兩種方式更改索引或列的標簽：

指定 index 和/或 columns
指定labels 和axis

使用參數index 或columns 比使用labels 和axis 更好，因為意圖更清晰，語法更短。

5. method | None 或 string | optional

填充缺失值時使用的邏輯：

值	說明
`None`	保留缺失值不變。
`"pad"` 或 `"ffill"`	使用前一行/列的值。
`"backfill"` 或 `"bfill"`	使用下一行/列的下一個值。
`"nearest"`	使用最近的行/列的值。

默認情況下，method=None 。請查看我們的示例以進行澄清。

警告

method參數僅在源DataFrame的行或列標簽單調遞增或單調遞減時生效。

6. copy | boolean | optional

是否創建並返回新的 DataFrame，而不是直接修改源 DataFrame。默認情況下，copy=True 。

7. level | string | optional

目標水平。僅當源 DataFrame 是多索引時，這才相關。

8. fill_value | scalar | optional

用於填充缺失值的值。默認情況下，fill_value=NaN 。

9. limit | int | optional

向前/向後填充的連續缺失值的最大數量。默認情況下，limit=None 。

10.tolerance | scalar 或 list | optional

是否進行填充根據以下標準：

abs(index[indexer] - target) <= tolerance.

指定tolerance而不指定method將導致錯誤。默認情況下，tolerance=None 。

返回值

行標簽或列標簽已更新的 DataFrame。

例子

考慮以下 DataFrame ：

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df



   A  B
a  2  4
b  3  5

更改行標簽

將索引(即行標簽)更改為 "a" 和 "c" ：

df.reindex(index=["a","c"])



   A    B
a  2.0  4.0
c  NaN  NaN

在此，請注意以下事項：

[aA] 和 [aB] 處的值保持原樣。這是因為[aA]和[aB]都存在於源DataFrame中。
[cA] 和 [cB] 處的值為 NaN 。這是因為源 DataFrame 中不存在 [cA] 和 [cB]。

更改列標簽

這是我們之前的df：

df



   A  B
a  2  4
b  3  5

要設置新的列標簽：

df.reindex(columns=["B","D"])



   B  D
b  4  NaN
d  5  NaN

在此，請注意以下事項：

[bB] 和 [dB] 處的值保持原樣。這是因為[bB]和[dB]都存在於源DataFrame中。
[Db] 和 [Dd] 處的值為 NaN 。這是因為源 DataFrame 中不存在 [Db] 和 [Dd]。

指定方法

考慮以下 DataFrame ：

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["b","d"])
df



   A  B
b  2  4
d  3  5

None

默認情況下， method=None ，這意味著不會執行填充，因此具有新行標簽或列標簽的值將為 NaN ：

df.reindex(index=["a","c"])



   A    B
a  NaN  NaN
c  NaN  NaN

填充

要使用以前的值進行填充，請傳入 method="ffill"，如下所示：

df.reindex(index=["a","c"], method="ffill")



   A    B
a  NaN  NaN
c  2.0  4.0

在此，請注意以下事項：

我們仍然有索引 "a" 的 NaN ，因為沒有索引小於 "a" ，即源 DataFrame 包含索引 "b" 和 "d" ，它們都大於 "a" 。
索引"c"中的值使用源DataFrame的索引"b"中的值填充。這是因為索引 "b" 是小於索引 "c" 的最後一個索引。

填充

作為參考，這裏再次df：

df



   A  B
b  2  4
d  3  5

要使用下一個值進行填充，請傳入 method="bfill"，如下所示：

df.reindex(index=["a","c"], method="bfill")



   A  B
a  2  4
c  3  5

在此，請注意以下事項：

行 "a" 中的值填充為源 DataFrame 行 "b" 中的值。這是因為大於索引 "a" 的下一個索引是索引 "b" 。
完全相同的推理適用於索引 "c" 的填充方式。

指定公差

考慮以下 DataFrame ：

df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=[3,6])
df



   A  B
3  2  4
6  3  5

假設我們想設置一個帶有前向填充的新索引[5,7]。我們可以指定tolerance來指示前向填充是否應該生效：

df.reindex(index=[5,7], method="ffill", tolerance=1)



   A    B
5  NaN  NaN
7  3.0  5.0

在此，請注意以下事項：

索引為 5 的行有 NaN 。這是因為 abs(3-5)=2 大於指定的 tolerance 。
索引為 7 的行已使用源 DataFrame 的索引 6 為 forward-filled。這是因為 abs(6-7)=1 小於或等於指定的 tolerance 。

相關用法

注：本文由純淨天空篩選整理自Isshin Inada大神的英文原創作品 Pandas DataFrame | reindex method。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

參數

注意

警告

返回值

例子

更改行標簽

更改列標簽

指定方法

None

填充

填充

最近的

指定公差