Python pyspark CategoricalIndex用法及代碼示例

本文簡要介紹 pyspark.pandas.CategoricalIndex 的用法。

用法: class pyspark.pandas.CategoricalIndex

基於基礎 Categorical 的索引。

CategoricalIndex 隻能采用有限且通常是固定數量的可能值 ( categories )。此外，它可能有順序，但不可能進行數字運算(加法、除法……)。

參數：

data：類似數組(一維): 分類的值。如果給出categories，則不在categories 中的值將被替換為NaN。
categories：index-like，可選: 類別的類別。項目必須是唯一的。如果這裏沒有給出類別(也沒有在 dtype 中)，它們將從 data 中推斷出來。
ordered：布爾型，可選: 此分類是否被視為有序分類。如果未在此處或 dtype 中給出，則生成的分類將是無序的。
dtype：CategoricalDtype 或 “category”，可選: 如果 CategoricalDtype ，不能與 categories 或 ordered 一起使用。
copy：布爾值，默認為 False: 製作輸入 ndarray 的副本。
name：對象，可選: 要存儲在索引中的名稱。

例子：

>>> ps.CategoricalIndex(["a", "b", "c", "a", "b", "c"])  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

CategoricalIndex 也可以從 Categorical 實例化：

>>> c = pd.Categorical(["a", "b", "c", "a", "b", "c"])
>>> ps.CategoricalIndex(c)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

已排序的 CategoricalIndex 可以具有最小值和最大值。

>>> ci = ps.CategoricalIndex(
...     ["a", "b", "c", "a", "b", "c"], ordered=True, categories=["c", "b", "a"]
... )
>>> ci  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['c', 'b', 'a'], ordered=True, dtype='category')

從一個係列：

>>> s = ps.Series(["a", "b", "c", "a", "b", "c"], index=[10, 20, 30, 40, 50, 60])
>>> ps.CategoricalIndex(s)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

從索引：

>>> idx = ps.Index(["a", "b", "c", "a", "b", "c"])
>>> ps.CategoricalIndex(idx)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.pandas.CategoricalIndex。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。