Python cuml.preprocessing.LabelEncoder.LabelEncoder用法及代碼示例

用法: class cuml.preprocessing.LabelEncoder.LabelEncoder(*, handle_unknown='error', handle=None, verbose=False, output_type=None)

基於 nvcategory 的序號標簽編碼實現

參數：

handle_unknown：{‘error’, ‘ignore’}，默認='錯誤': 如果在轉換期間存在未知的分類特征，是否引發錯誤或忽略(默認為引發)。當此參數設置為‘ignore’ 並且在變換或逆變換過程中遇到未知類別時，生成的編碼將為空。
handle：cuml.Handle: 指定 cuml.handle 保存用於此模型中計算的內部 CUDA 狀態。最重要的是，這指定了將用於模型計算的 CUDA 流，因此用戶可以通過在多個流中創建句柄在不同的流中同時運行不同的模型。如果為 None，則創建一個新的。
verbose：int 或布爾值，默認=False: 設置日誌記錄級別。它必須是 cuml.common.logger.level_* 之一。有關詳細信息，請參閱詳細級別。
output_type：{‘input’, ‘cudf’, ‘cupy’, ‘numpy’, ‘numba’}，默認=無: 用於控製估計器的結果和屬性的輸出類型的變量。如果為 None，它將繼承在模塊級別設置的輸出類型 cuml.global_settings.output_type 。有關詳細信息，請參閱輸出數據類型配置。

例子：

將分類實現轉換為數字實現

from cudf import DataFrame, Series

data = DataFrame({'category': ['a', 'b', 'c', 'd']})

# There are two functionally equivalent ways to do this
le = LabelEncoder()
le.fit(data.category)  # le = le.fit(data.category) also works
encoded = le.transform(data.category)

print(encoded)

# This method is preferred
le = LabelEncoder()
encoded = le.fit_transform(data.category)

print(encoded)

# We can assign this to a new column
data = data.assign(encoded=encoded)
print(data.head())

# We can also encode more data
test_data = Series(['c', 'a'])
encoded = le.transform(test_data)
print(encoded)

# After train, ordinal label can be inverse_transform() back to
# string labels
ord_label = cudf.Series([0, 0, 1, 2, 1])
ord_label = dask_cudf.from_cudf(data, npartitions=2)
str_label = le.inverse_transform(ord_label)
print(str_label)

輸出：

0    0
1    1
2    2
3    3
dtype: int64

0    0
1    1
2    2
3    3
dtype: int32

category  encoded
0         a        0
1         b        1
2         c        2
3         d        3

0    2
1    0
dtype: int64

0    a
1    a
2    b
3    c
4    b
dtype: object

相關用法

注：本文由純淨天空篩選整理自rapids.ai大神的英文原創作品 cuml.preprocessing.LabelEncoder.LabelEncoder。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。