重命名 Pandas 中DataFrame的多個列

我在Pandas中使用帶列名的DataFrame，我需要編輯以替換或者說重命名原來的列名(標簽)。

示例如下：我想更改DataFrame A中的列名，其中原始列名是：

['$a', '$b', '$c', '$d', '$e']

想改為

['a', 'b', 'c', 'd', 'e'].

我將編輯過的列名存儲在列表中，但不知道如何替換列名。

Pandas重命dataframe名列名

最佳解決方案

隻需將其分配給.columns屬性即可，如下：

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

次佳解決方案

使用df.rename()函數並引用要重命名的列。並非所有列都必須重命名，可以修改一部分列：

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

第三種解決方案

rename方法可以采用一個函數替代，例如：

In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)

In [12]: df.rename(columns=lambda x: x[1:], inplace=True)

In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)

第四種方案

既然你隻想刪除所有列名中的$符號，你可以這樣做：

df = df.rename(columns=lambda x: x.replace('$', ''))

要麽

df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

第五種方案

如http://pandas.pydata.org/pandas-docs/stable/text.html中所記錄：

df.columns = df.columns.str.replace('$','')

第六種方案

df.columns = ['a', 'b', 'c', 'd', 'e']

上麵的代碼會按照您提供的順序，用您提供的名稱替換現有名稱。

也可以像這樣通過索引來修改它們：

df.columns.values[2] = 'c'    #renames the 2nd column to 'c'

第七種方案

Pandas 0.21+答案

在版本0.21中對列重命名進行了一些重大更新。

rename方法添加了可設置為columns或1的axis參數。此更新使此方法與 Pandas API的其餘部分相匹配。它仍然有index和columns參數，但不再強製使用它們。
將inplace設置為False的set_axis方法可以使用列表重命名所有索引或列標簽。

Pandas 0.21+的例子

構建示例DataFrame：

df = pd.DataFrame({'$a':[1,2], '$b': [3,4], 
                   '$c':[5,6], '$d':[7,8], 
                   '$e':[9,10]})

   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

與`axis='columns'`或`axis=1`一起使用`rename`

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')

要麽

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)

這兩個結果如下：

   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

仍然可以使用舊的方法簽名：

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})

rename函數還接受將應用於每個列名稱的函數。

df.rename(lambda x: x[1:], axis='columns')

要麽

df.rename(lambda x: x[1:], axis=1)

將`set_axis`與列表和`inplace=False`一起使用

您可以向set_axis方法提供長度與列數(或索引)數量相等的列表。目前，inplace默認為True，但inplace將在未來版本中默認為False。

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)

要麽

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)

為什麽不使用`df.columns = ['a', 'b', 'c', 'd', 'e']`？

直接分配列沒有任何問題。這是一個非常好的解決方案。

使用set_axis的優勢在於它可以用作方法鏈的一部分，並返回DataFrame的新副本。沒有它，您必須在重新分配列之前將鏈的中間步驟存儲到另一個變量中。

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

第八種方案

old_names = ['$a', '$b', '$c', '$d', '$e'] 
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

這樣您可以根據需要手動編輯new_names。當您隻需重新命名幾列以糾正拚寫錯誤，重音符號，刪除特殊字符等時，它的效果非常好。

第九種方案

列名與係列的名稱

我想解釋一下幕後發生的一切。

數據框是一組係列。

係列又是numpy.array的延伸

numpy.array有一個屬性.name

這是該係列的名稱。 Pandas 很少尊重這個屬性，但它在某些地方徘徊，可以用來攻擊一些 Pandas 的行為。

命名列的列表

這裏有很多答案都談到了df.columns屬性是list，實際上它是Series。這意味著它有一個.name屬性。

如果您決定填寫列Series的名稱，則會發生以下情況：

df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']

name of the list of columns     column_one  column_two
name of the index       
0                                    4           1
1                                    5           2
2                                    6           3

請注意，索引的名稱始終低一列。

徘徊的文物

.name屬性有時會繼續存在。如果您設置df.columns = ['one', 'two']，則df.one.name將為'one'。

如果你設置了df.one.name = 'three'，那麽df.columns仍然會給你['one', 'two']，而df.one.name會給你'three'

BUT

pd.DataFrame(df.one)將返回

因為 Pandas 重複使用已定義的Series的.name。

多級別的列名稱

Pandas 有辦法做多層列名。沒有太多的魔術參與，但我想在我的回答中加以說明，因為我沒有看到任何人在此采訪。

    |one            |
    |one      |two  |
0   |  4      |  1  |
1   |  5      |  2  |
2   |  6      |  3  |

通過將列設置為列表可以輕鬆實現，如下所示：

df.columns = [['one', 'one'], ['one', 'two']]

第十種方案

一條線或管線解決方案

我將專注於兩件事情：

OP明確聲明我已將編輯的列名存儲在列表中，但我不知道如何替換列名。我不想解決如何替換'$'或從每個列標題剝離第一個字符的問題。 OP已經完成了這一步。相反，我想專注於用給定的替換列名稱列表替換現有的columns對象。
df.columns = new其中new是新列名稱的列表非常簡單。這種方法的缺點是它需要編輯現有數據框的columns屬性，而不是內聯完成的。我將通過流水線顯示幾種方式來執行此操作，而無需編輯現有數據框。

安裝程序1為了專注於重新命名使用pre-existing列表替換列名稱的需要，我將創建一個新的示例數據框df，其中包含初始列名稱和不相關的新列名稱。

df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
new = ['x098', 'y765', 'z432']

df

   Jack  Mahesh  Xin
0     1       3    5
1     2       4    6

解決方案1 pd.DataFrame.rename

已經有人說過，如果你有一個字典將舊列名映射到新的列名，你可以使用pd.DataFrame.rename。

d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
df.rename(columns=d)

   x098  y765  z432
0     1     3     5
1     2     4     6

但是，您可以輕鬆創建該字典並將其包含在對rename的調用中。以下利用了在遍曆df時重複遍曆每個列名的事實。

# given just a list of new column names
df.rename(columns=dict(zip(df, new)))

   x098  y765  z432
0     1     3     5
1     2     4     6

如果您的原始列名是唯一的，這非常有效。但如果他們不是，那麽這就打破了。

設置2個non-unique列

df = pd.DataFrame(
    [[1, 3, 5], [2, 4, 6]],
    columns=['Mahesh', 'Mahesh', 'Xin']
)
new = ['x098', 'y765', 'z432']

df

   Mahesh  Mahesh  Xin
0       1       3    5
1       2       4    6

解決方案2使用keys參數的pd.concat

首先，注意當我們嘗試使用解決方案1時會發生什麽情況：

df.rename(columns=dict(zip(df, new)))

   y765  y765  z432
0     1     3     5
1     2     4     6

我們沒有將new列表映射為列名。我們結束了重複y765。相反，我們可以在遍曆df的列時使用pd.concat函數的keys參數。

pd.concat([c for _, c in df.items()], axis=1, keys=new) 

   x098  y765  z432
0     1     3     5
1     2     4     6

解決方案3重建。這應該隻用於所有列都有單個dtype的情況。否則，您將以dtype object結束所有列，並將其轉換回來需要更多的字典工作。

單dtype

pd.DataFrame(df.values, df.index, new)

   x098  y765  z432
0     1     3     5
1     2     4     6

混合dtype

pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

解決方案4這是transpose和set_index的噱頭。 pd.DataFrame.set_index允許我們設置內聯索引，但沒有相應的set_columns。所以我們可以轉置，然後set_index，並轉置回來。但是，在這裏適用同一個dtype與來自解決方案3的混合dtype警告。