Python cuml.MBSGDRegressor用法及代碼示例

用法: class cuml.MBSGDRegressor(*, loss='squared_loss', penalty='l2', alpha=0.0001, l1_ratio=0.15, fit_intercept=True, epochs=1000, tol=0.001, shuffle=True, learning_rate='constant', eta0=0.001, power_t=0.5, batch_size=32, n_iter_no_change=5, handle=None, verbose=False, output_type=None)

通過使用小批量 SGD 最小化正則化經驗損失來擬合線性回歸模型。 MBSGD Regressor 實現是實驗性的，它使用與 sklearn 的 SGDClassifier 不同的算法。為了改進從 cuML 的 MBSGD Regressor 獲得的結果： * 減少批量大小 * 增加 eta0 * 增加迭代次數由於 cuML 使用小的 eta0 分批分析數據可能不會讓模型像 scikit learn 那樣學習做。此外，減小批量大小可能會增加擬合模型所需的時間。

參數：

loss：‘squared_loss’(默認 = ‘squared_loss’)

‘squared_loss’ 使用線性回歸

penalty: ‘none’, ‘l1’, ‘l2’, ‘elasticnet’ (default = ‘none’)：

‘none’ 不執行任何正則化 ‘l1’ 執行 L1 範數(Lasso)，最小化係數的絕對值之和 ‘l2’ 執行 L2 範數(Ridge)，最小化係數的平方和 ‘elasticnet’ 執行彈性網絡正則化，它是 L1 和 L2 範數的加權平均

alpha: float (default = 0.0001)：

決定正則化程度的常數值

fit_intercept：布爾值(默認 = True)

如果為 True，模型會嘗試校正 y 的全局平均值。如果為 False，則模型預計您已將數據居中。

l1_ratio: float (default=0.15)：

l1_ratio 僅在 penalty = elasticnet 時使用。 l1_ratio 的值應該是 0 <= l1_ratio <= 1 。當l1_ratio = 0然後penalty = 'l2'並且如果l1_ratio = 1然後penalty = 'l1'

batch_size: int (default = 32)：

它設置將包含在每個批次中的樣本數量。

epochs：int(默認值 = 1000)

模型在訓練期間應該遍曆整個數據集的次數(默認 = 1000)

tol：浮點數(默認 = 1e-3)

如果 current_loss > previous_loss - tol，訓練過程將停止

shuffle：布爾值(默認 = True)

True，在每個 epoch 之後打亂訓練數據 False，在每個 epoch 之後不打亂訓練數據

eta0：浮點數(默認 = 0.001)

初始學習率

power_t：浮點數(默認 = 0.5)

用於計算 invscaling 學習率的 index

learning_rate：{‘optimal’, ‘constant’, ‘invscaling’, ‘adaptive’}

(默認 = ‘constant’)

optimal 選項將在未來版本中支持

constant 保持學習率不變

如果 n_iter_no_change 時期的訓練損失或驗證準確度沒有提高，則 adaptive 會更改學習率。老學習率一般除以5

n_iter_no_change：int(默認值 = 5)

在模型沒有任何改進的情況下訓練的 epoch 數

handle：cuml.Handle

指定 cuml.handle 保存用於此模型中計算的內部 CUDA 狀態。最重要的是，這指定了將用於模型計算的 CUDA 流，因此用戶可以通過在多個流中創建句柄在不同的流中同時運行不同的模型。如果為 None，則創建一個新的。

verbose：int 或布爾值，默認=False

設置日誌記錄級別。它必須是 cuml.common.logger.level_* 之一。有關詳細信息，請參閱詳細級別。

output_type：{‘input’, ‘cudf’, ‘cupy’, ‘numpy’, ‘numba’}，默認=無

用於控製估計器的結果和屬性的輸出類型的變量。如果為 None，它將繼承在模塊級別設置的輸出類型 cuml.global_settings.output_type 。有關詳細信息，請參閱輸出數據類型配置。

注意：

有關其他文檔，請參閱 scikitlearn’s SGDRegressor 。

例子：

import numpy as np
import cudf
from cuml.linear_model import MBSGDRegressor as cumlMBSGDRegressor
X = cudf.DataFrame()
X['col1'] = np.array([1,1,2,2], dtype = np.float32)
X['col2'] = np.array([1,2,2,3], dtype = np.float32)
y = cudf.Series(np.array([1, 1, 2, 2], dtype=np.float32))
pred_data = cudf.DataFrame()
pred_data['col1'] = np.asarray([3, 2], dtype=np.float32)
pred_data['col2'] = np.asarray([5, 5], dtype=np.float32)
cu_mbsgd_regressor = cumlMBSGDRegressor(learning_rate='constant',
                                        eta0=0.05, epochs=2000,
                                        fit_intercept=True,
                                        batch_size=1, tol=0.0,
                                        penalty='l2',
                                        loss='squared_loss',
                                        alpha=0.5)
cu_mbsgd_regressor.fit(X, y)
cu_pred = cu_mbsgd_regressor.predict(pred_data).to_numpy()
print(" cuML intercept : ", cu_mbsgd_regressor.intercept_)
print(" cuML coef : ", cu_mbsgd_regressor.coef_)
print("cuML predictions : ", cu_pred)

輸出：

cuML intercept :  0.7150013446807861
cuML coef :  0    0.27320495
            1     0.1875956
            dtype: float32
cuML predictions :  [2.4725943 2.1993892]

相關用法

注：本文由純淨天空篩選整理自rapids.ai大神的英文原創作品 cuml.MBSGDRegressor。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。