Python pyspark CategoricalIndex用法及代码示例

本文简要介绍 pyspark.pandas.CategoricalIndex 的用法。

用法: class pyspark.pandas.CategoricalIndex

基于基础 Categorical 的索引。

CategoricalIndex 只能采用有限且通常是固定数量的可能值 ( categories )。此外，它可能有顺序，但不可能进行数字运算(加法、除法……)。

参数：

data：类似数组(一维): 分类的值。如果给出categories，则不在categories 中的值将被替换为NaN。
categories：index-like，可选: 类别的类别。项目必须是唯一的。如果这里没有给出类别(也没有在 dtype 中)，它们将从 data 中推断出来。
ordered：布尔型，可选: 此分类是否被视为有序分类。如果未在此处或 dtype 中给出，则生成的分类将是无序的。
dtype：CategoricalDtype 或 “category”，可选: 如果 CategoricalDtype ，不能与 categories 或 ordered 一起使用。
copy：布尔值，默认为 False: 制作输入 ndarray 的副本。
name：对象，可选: 要存储在索引中的名称。

例子：

>>> ps.CategoricalIndex(["a", "b", "c", "a", "b", "c"])  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

CategoricalIndex 也可以从 Categorical 实例化：

>>> c = pd.Categorical(["a", "b", "c", "a", "b", "c"])
>>> ps.CategoricalIndex(c)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

已排序的 CategoricalIndex 可以具有最小值和最大值。

>>> ci = ps.CategoricalIndex(
...     ["a", "b", "c", "a", "b", "c"], ordered=True, categories=["c", "b", "a"]
... )
>>> ci  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['c', 'b', 'a'], ordered=True, dtype='category')

从一个系列：

>>> s = ps.Series(["a", "b", "c", "a", "b", "c"], index=[10, 20, 30, 40, 50, 60])
>>> ps.CategoricalIndex(s)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

从索引：

>>> idx = ps.Index(["a", "b", "c", "a", "b", "c"])
>>> ps.CategoricalIndex(idx)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

相关用法

注：本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.pandas.CategoricalIndex。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。