Python cuml.LogisticRegression用法及代碼示例

用法: class cuml.LogisticRegression(*, penalty='l2', tol=0.0001, C=1.0, fit_intercept=True, class_weight=None, max_iter=1000, linesearch_max_iter=50, verbose=False, l1_ratio=None, solver='qn', handle=None, output_type=None)

LogisticRegression 是一個線性模型，用於對某些事件發生的概率進行建模，例如事件成功或失敗的概率。

除了 cuDF 對象之外，cuML 的 LogisticRegression 可以在主機中作為 NumPy 數組或在設備中(作為 Numba 或 __cuda_array_interface__ 兼容)獲取 array-like 對象。它提供single-class(使用sigmoid損失)和multiple-class(使用softmax損失)變體，具體取決於輸入變量

當前隻有一個求解器選項可用：Quasi-Newton (QN) 算法。盡管它是作為一個選項提供的，但該求解器在下麵解析為兩種不同的算法：

Orthant-Wise 內存有限Quasi-Newton (OWL-QN) 如果有 l1 正則化
否則，內存有限 BFGS (L-BFGS)。

請注意，就像在Scikit-learn 中一樣，偏差不會被正則化。

參數：

penalty: ‘none’, ‘l1’, ‘l2’, ‘elasticnet’ (default = ‘l2’)：

用於指定懲罰中使用的規範。如果選擇‘none’ 或‘l2’，則將使用L-BFGS 求解器。如果選擇‘l1’，將使用求解器OWL-QN。如果選擇‘elasticnet’，如果l1_ratio > 0，將使用OWL-QN，否則將使用L-BFGS。

tol: float (default = 1e-4)：

停止標準的公差。確切的停止條件取決於所選的求解器。查看求解器的文檔以獲取更多詳細信息：

cuml.QN

C: float (default = 1.0)：

正則化強度的倒數；必須是正浮點數。

fit_intercept: boolean (default = True)：

如果為 True，模型會嘗試校正 y 的全局平均值。如果為 False，則模型預計您已將數據居中。

class_weight: None：

當前不支持自定義類權重。

class_weight: dict or ‘balanced’, default=None：

默認情況下，所有類都有一個權重。但是，可以為字典提供與 {class_label: weight} 形式的類關聯的權重。 “balanced” 模式使用 y 的值自動調整與輸入數據中的類頻率成反比的權重，如 n_samples / (n_classes * np.bincount(y)) 。請注意，如果指定了 sample_weight，這些權重將乘以 sample_weight(通過 fit 方法傳遞)。

max_iter: int (default = 1000)：

求解器收斂的最大迭代次數。

linesearch_max_iter: int (default = 50)：

在 lbfgs 和 owl QN 求解器中使用的每個外部迭代的最大線搜索迭代次數。

verbose：int 或布爾值，默認=False

設置日誌記錄級別。它必須是 cuml.common.logger.level_* 之一。有關詳細信息，請參閱詳細級別。

l1_ratio: float or None, optional (default=None)：

Elastic-Net 混合參數，與0 <= l1_ratio <= 1

solver: ‘qn’, ‘lbfgs’, ‘owl’ (default=’qn’).：

用於優化問題的算法。目前僅支持qn，它會根據上述l1正則化的條件自動選擇L-BFGS或OWL-QN。選項 ‘lbfgs’ and ‘owl’ 隻是方便值，最終使用相同的求解器遵循相同的規則。

handle：cuml.Handle

指定 cuml.handle 保存用於此模型中計算的內部 CUDA 狀態。最重要的是，這指定了將用於模型計算的 CUDA 流，因此用戶可以通過在多個流中創建句柄在不同的流中同時運行不同的模型。如果為 None，則創建一個新的。

output_type：{‘input’, ‘cudf’, ‘cupy’, ‘numpy’, ‘numba’}，默認=無

用於控製估計器的結果和屬性的輸出類型的變量。如果為 None，它將繼承在模塊級別設置的輸出類型 cuml.global_settings.output_type 。有關詳細信息，請參閱輸出數據類型配置。

注意：

cuML 的LogisticRegression 使用與等效的Scikit-learn 不同的求解器，除非沒有懲罰並且在Scikit-learn 中使用solver=lbfgs。這可能會導致模型的係數和預測出現(較小的)差異，類似於在Scikit-learn 中使用不同的求解器。

有關其他信息，請參閱Scikit-learn’s LogisticRegression。

例子：

import cudf
import numpy as np

# Both import methods supported
# from cuml import LogisticRegression
from cuml.linear_model import LogisticRegression

X = cudf.DataFrame()
X['col1'] = np.array([1,1,2,2], dtype = np.float32)
X['col2'] = np.array([1,2,2,3], dtype = np.float32)
y = cudf.Series( np.array([0.0, 0.0, 1.0, 1.0], dtype = np.float32) )

reg = LogisticRegression()
reg.fit(X,y)

print("Coefficients:")
print(reg.coef_)
print("Intercept:")
print(reg.intercept_)

X_new = cudf.DataFrame()
X_new['col1'] = np.array([1,5], dtype = np.float32)
X_new['col2'] = np.array([2,5], dtype = np.float32)

preds = reg.predict(X_new)

print("Predictions:")
print(preds)

輸出：

Coefficients:
            0.22309814
            0.21012752
Intercept:
            -0.7548761
Predictions:
            0    0.0
            1    1.0

屬性：

coef_: dev array, dim (n_classes, n_features) or (n_classes, n_features+1)：: 線性回歸模型的估計係數。
intercept_: device array (n_classes, 1)：: 獨立術語。如果 fit_intercept 為 False，則為 0。

相關用法

注：本文由純淨天空篩選整理自rapids.ai大神的英文原創作品 cuml.LogisticRegression。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。