Python pyspark CategoricalIndex.map用法及代碼示例

本文簡要介紹 pyspark.pandas.CategoricalIndex.map 的用法。

用法: CategoricalIndex.map(mapper: Union[dict, Callable[[Any], Any], pandas.core.series.Series]) → pyspark.pandas.indexes.base.Index

使用輸入對應關係(字典、係列或函數)映射值。

將索引的值(其類別，而不是代碼)映射到新類別。如果映射對應關係是一對一的，則結果是CategoricalIndex，其與原始值具有相同的順序屬性，否則返回Index。

如果使用 dict 或 Series，則任何未映射的類別都將映射到缺失值。請注意，如果發生這種情況，將返回 Index。

參數：

mapper：函數、字典或係列: 映射對應。

CategoricalIndex 或索引: 映射索引。

例子：

>>> idx = ps.CategoricalIndex(['a', 'b', 'c'])
>>> idx  
CategoricalIndex(['a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

>>> idx.map(lambda x: x.upper())  
CategoricalIndex(['A', 'B', 'C'],
                 categories=['A', 'B', 'C'], ordered=False, dtype='category')

>>> pser = pd.Series([1, 2, 3], index=pd.CategoricalIndex(['a', 'b', 'c'], ordered=True))
>>> idx.map(pser)  
CategoricalIndex([1, 2, 3],
                 categories=[1, 2, 3], ordered=False, dtype='category')

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})  
CategoricalIndex(['first', 'second', 'third'],
                 categories=['first', 'second', 'third'], ordered=False, dtype='category')

如果映射是一對一的，則保留類別的順序：

>>> idx = ps.CategoricalIndex(['a', 'b', 'c'], ordered=True)
>>> idx  
CategoricalIndex(['a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=True, dtype='category')

>>> idx.map({'a': 3, 'b': 2, 'c': 1})  
CategoricalIndex([3, 2, 1],
                 categories=[3, 2, 1], ordered=True, dtype='category')

如果映射不是一對一，則返回 Index：

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

如果使用 dict，則所有未映射的類別都映射到 None 並且結果是 Index ：

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', None], dtype='object')

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.CategoricalIndex.map。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

例子：