Python cuml.MBSGDRegressor用法及代码示例

用法: class cuml.MBSGDRegressor(*, loss='squared_loss', penalty='l2', alpha=0.0001, l1_ratio=0.15, fit_intercept=True, epochs=1000, tol=0.001, shuffle=True, learning_rate='constant', eta0=0.001, power_t=0.5, batch_size=32, n_iter_no_change=5, handle=None, verbose=False, output_type=None)

通过使用小批量 SGD 最小化正则化经验损失来拟合线性回归模型。 MBSGD Regressor 实现是实验性的，它使用与 sklearn 的 SGDClassifier 不同的算法。为了改进从 cuML 的 MBSGD Regressor 获得的结果： * 减少批量大小 * 增加 eta0 * 增加迭代次数由于 cuML 使用小的 eta0 分批分析数据可能不会让模型像 scikit learn 那样学习做。此外，减小批量大小可能会增加拟合模型所需的时间。

参数：

loss：‘squared_loss’(默认 = ‘squared_loss’)

‘squared_loss’ 使用线性回归

penalty: ‘none’, ‘l1’, ‘l2’, ‘elasticnet’ (default = ‘none’)：

‘none’ 不执行任何正则化 ‘l1’ 执行 L1 范数(Lasso)，最小化系数的绝对值之和 ‘l2’ 执行 L2 范数(Ridge)，最小化系数的平方和 ‘elasticnet’ 执行弹性网络正则化，它是 L1 和 L2 范数的加权平均

alpha: float (default = 0.0001)：

决定正则化程度的常数值

fit_intercept：布尔值(默认 = True)

如果为 True，模型会尝试校正 y 的全局平均值。如果为 False，则模型预计您已将数据居中。

l1_ratio: float (default=0.15)：

l1_ratio 仅在 penalty = elasticnet 时使用。 l1_ratio 的值应该是 0 <= l1_ratio <= 1 。当l1_ratio = 0然后penalty = 'l2'并且如果l1_ratio = 1然后penalty = 'l1'

batch_size: int (default = 32)：

它设置将包含在每个批次中的样本数量。

epochs：int(默认值 = 1000)

模型在训练期间应该遍历整个数据集的次数(默认 = 1000)

tol：浮点数(默认 = 1e-3)

如果 current_loss > previous_loss - tol，训练过程将停止

shuffle：布尔值(默认 = True)

True，在每个 epoch 之后打乱训练数据 False，在每个 epoch 之后不打乱训练数据

eta0：浮点数(默认 = 0.001)

初始学习率

power_t：浮点数(默认 = 0.5)

用于计算 invscaling 学习率的 index

learning_rate：{‘optimal’, ‘constant’, ‘invscaling’, ‘adaptive’}

(默认 = ‘constant’)

optimal 选项将在未来版本中支持

constant 保持学习率不变

如果 n_iter_no_change 时期的训练损失或验证准确度没有提高，则 adaptive 会更改学习率。老学习率一般除以5

n_iter_no_change：int(默认值 = 5)

在模型没有任何改进的情况下训练的 epoch 数

handle：cuml.Handle

指定 cuml.handle 保存用于此模型中计算的内部 CUDA 状态。最重要的是，这指定了将用于模型计算的 CUDA 流，因此用户可以通过在多个流中创建句柄在不同的流中同时运行不同的模型。如果为 None，则创建一个新的。

verbose：int 或布尔值，默认=False

设置日志记录级别。它必须是 cuml.common.logger.level_* 之一。有关详细信息，请参阅详细级别。

output_type：{‘input’, ‘cudf’, ‘cupy’, ‘numpy’, ‘numba’}，默认=无

用于控制估计器的结果和属性的输出类型的变量。如果为 None，它将继承在模块级别设置的输出类型 cuml.global_settings.output_type 。有关详细信息，请参阅输出数据类型配置。

注意：

有关其他文档，请参阅 scikitlearn’s SGDRegressor 。

例子：

import numpy as np
import cudf
from cuml.linear_model import MBSGDRegressor as cumlMBSGDRegressor
X = cudf.DataFrame()
X['col1'] = np.array([1,1,2,2], dtype = np.float32)
X['col2'] = np.array([1,2,2,3], dtype = np.float32)
y = cudf.Series(np.array([1, 1, 2, 2], dtype=np.float32))
pred_data = cudf.DataFrame()
pred_data['col1'] = np.asarray([3, 2], dtype=np.float32)
pred_data['col2'] = np.asarray([5, 5], dtype=np.float32)
cu_mbsgd_regressor = cumlMBSGDRegressor(learning_rate='constant',
                                        eta0=0.05, epochs=2000,
                                        fit_intercept=True,
                                        batch_size=1, tol=0.0,
                                        penalty='l2',
                                        loss='squared_loss',
                                        alpha=0.5)
cu_mbsgd_regressor.fit(X, y)
cu_pred = cu_mbsgd_regressor.predict(pred_data).to_numpy()
print(" cuML intercept : ", cu_mbsgd_regressor.intercept_)
print(" cuML coef : ", cu_mbsgd_regressor.coef_)
print("cuML predictions : ", cu_pred)

输出：

cuML intercept :  0.7150013446807861
cuML coef :  0    0.27320495
            1     0.1875956
            dtype: float32
cuML predictions :  [2.4725943 2.1993892]

相关用法

注：本文由纯净天空筛选整理自rapids.ai大神的英文原创作品 cuml.MBSGDRegressor。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。