Python pyspark Series.replace用法及代碼示例

本文簡要介紹 pyspark.pandas.Series.replace 的用法。

用法: Series.replace(to_replace: Union[Any, List, Tuple, Dict, None] = None, value: Union[List, Tuple, None] = None, regex: bool = False) → pyspark.pandas.series.Series

將 to_replace 中給出的值替換為值。係列的值被動態替換為其他值。

參數：

to_replace：str、list、tuple、dict、Series、int、float 或 None

如何找到將被替換的值。 * 數字，str：

numeric：等於 to_replace 的數值將被替換為 value
str：與to_replace完全匹配的字符串將被替換為值

str 或數字列表：
- 如果 to_replace 和 value 都是列表或元組，則它們的長度必須相同。
- str 和 numeric 規則如上適用。
字典：
- 字典可用於為不同的現有值指定不同的替換值。例如，{‘a’: ‘b’, ‘y’: ‘z’} 將值 ‘a’ 替換為 ‘b’，並將 ‘y’ 替換為 ‘z’。要以這種方式使用字典，value 參數應該為 None。
- 對於DataFrame，字典可以指定應在不同列中替換不同的值。例如，{‘a’: 1, ‘b’: ‘z’} 在 ‘a’ 列中查找值 1，在 ‘b’ 列中查找值 ‘z’，並將這些值替換為 value 中指定的值。在這種情況下，value 參數不應為 None。您可以將此視為傳遞兩個列表的特殊情況，除非您指定要搜索的列。

有關每個示例，請參見示例部分。

value：標量、字典、列表、元組、str 默認無

用於替換與 to_replace 匹配的任何值的值。對於DataFrame，可以使用值字典來指定每列使用哪個值(不在字典中的列將不會被填充)。還允許使用此類對象的正則表達式、字符串和列表或字典。

Series: 替換後的對象。

例子：

標量 to_replace 和 value

>>> s = ps.Series([0, 1, 2, 3, 4])
>>> s
0    0
1    1
2    2
3    3
4    4
dtype: int64

>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64

List-like to_replace

>>> s.replace([0, 4], 5000)
0    5000
1       1
2       2
3       3
4    5000
dtype: int64

>>> s.replace([1, 2, 3], [10, 20, 30])
0     0
1    10
2    20
3    30
4     4
dtype: int64

Dict-like to_replace

>>> s.replace({1: 1000, 2: 2000, 3: 3000, 4: 4000})
0       0
1    1000
2    2000
3    3000
4    4000
dtype: int64

還支持MultiIndex

>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
...                       ['speed', 'weight', 'length']],
...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> s = ps.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
...               index=midx)
>>> s
lama    speed      45.0
        weight    200.0
        length      1.2
cow     speed      30.0
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64

>>> s.replace(45, 450)
lama    speed     450.0
        weight    200.0
        length      1.2
cow     speed      30.0
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64

>>> s.replace([45, 30, 320], 500)
lama    speed     500.0
        weight    200.0
        length      1.2
cow     speed     500.0
        weight    250.0
        length      1.5
falcon  speed     500.0
        weight      1.0
        length      0.3
dtype: float64

>>> s.replace({45: 450, 30: 300})
lama    speed     450.0
        weight    200.0
        length      1.2
cow     speed     300.0
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.Series.replace。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

例子：