Python pyspark CategoricalIndex.map用法及代码示例

本文简要介绍 pyspark.pandas.CategoricalIndex.map 的用法。

用法: CategoricalIndex.map(mapper: Union[dict, Callable[[Any], Any], pandas.core.series.Series]) → pyspark.pandas.indexes.base.Index

使用输入对应关系(字典、系列或函数)映射值。

将索引的值(其类别，而不是代码)映射到新类别。如果映射对应关系是一对一的，则结果是CategoricalIndex，其与原始值具有相同的顺序属性，否则返回Index。

如果使用 dict 或 Series，则任何未映射的类别都将映射到缺失值。请注意，如果发生这种情况，将返回 Index。

参数：

mapper：函数、字典或系列: 映射对应。

CategoricalIndex 或索引: 映射索引。

例子：

>>> idx = ps.CategoricalIndex(['a', 'b', 'c'])
>>> idx  
CategoricalIndex(['a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

>>> idx.map(lambda x: x.upper())  
CategoricalIndex(['A', 'B', 'C'],
                 categories=['A', 'B', 'C'], ordered=False, dtype='category')

>>> pser = pd.Series([1, 2, 3], index=pd.CategoricalIndex(['a', 'b', 'c'], ordered=True))
>>> idx.map(pser)  
CategoricalIndex([1, 2, 3],
                 categories=[1, 2, 3], ordered=False, dtype='category')

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'})  
CategoricalIndex(['first', 'second', 'third'],
                 categories=['first', 'second', 'third'], ordered=False, dtype='category')

如果映射是一对一的，则保留类别的顺序：

>>> idx = ps.CategoricalIndex(['a', 'b', 'c'], ordered=True)
>>> idx  
CategoricalIndex(['a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=True, dtype='category')

>>> idx.map({'a': 3, 'b': 2, 'c': 1})  
CategoricalIndex([3, 2, 1],
                 categories=[3, 2, 1], ordered=True, dtype='category')

如果映射不是一对一，则返回 Index：

>>> idx.map({'a': 'first', 'b': 'second', 'c': 'first'})
Index(['first', 'second', 'first'], dtype='object')

如果使用 dict，则所有未映射的类别都映射到 None 并且结果是 Index ：

>>> idx.map({'a': 'first', 'b': 'second'})
Index(['first', 'second', None], dtype='object')

相关用法

注：本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.pandas.CategoricalIndex.map。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数：

返回：

例子：