Python cuml.LogisticRegression用法及代码示例

用法: class cuml.LogisticRegression(*, penalty='l2', tol=0.0001, C=1.0, fit_intercept=True, class_weight=None, max_iter=1000, linesearch_max_iter=50, verbose=False, l1_ratio=None, solver='qn', handle=None, output_type=None)

LogisticRegression 是一个线性模型，用于对某些事件发生的概率进行建模，例如事件成功或失败的概率。

除了 cuDF 对象之外，cuML 的 LogisticRegression 可以在主机中作为 NumPy 数组或在设备中(作为 Numba 或 __cuda_array_interface__ 兼容)获取 array-like 对象。它提供single-class(使用sigmoid损失)和multiple-class(使用softmax损失)变体，具体取决于输入变量

当前只有一个求解器选项可用：Quasi-Newton (QN) 算法。尽管它是作为一个选项提供的，但该求解器在下面解析为两种不同的算法：

Orthant-Wise 内存有限Quasi-Newton (OWL-QN) 如果有 l1 正则化
否则，内存有限 BFGS (L-BFGS)。

请注意，就像在Scikit-learn 中一样，偏差不会被正则化。

参数：

penalty: ‘none’, ‘l1’, ‘l2’, ‘elasticnet’ (default = ‘l2’)：

用于指定惩罚中使用的规范。如果选择‘none’ 或‘l2’，则将使用L-BFGS 求解器。如果选择‘l1’，将使用求解器OWL-QN。如果选择‘elasticnet’，如果l1_ratio > 0，将使用OWL-QN，否则将使用L-BFGS。

tol: float (default = 1e-4)：

停止标准的公差。确切的停止条件取决于所选的求解器。查看求解器的文档以获取更多详细信息：

cuml.QN

C: float (default = 1.0)：

正则化强度的倒数；必须是正浮点数。

fit_intercept: boolean (default = True)：

如果为 True，模型会尝试校正 y 的全局平均值。如果为 False，则模型预计您已将数据居中。

class_weight: None：

当前不支持自定义类权重。

class_weight: dict or ‘balanced’, default=None：

默认情况下，所有类都有一个权重。但是，可以为字典提供与 {class_label: weight} 形式的类关联的权重。 “balanced” 模式使用 y 的值自动调整与输入数据中的类频率成反比的权重，如 n_samples / (n_classes * np.bincount(y)) 。请注意，如果指定了 sample_weight，这些权重将乘以 sample_weight(通过 fit 方法传递)。

max_iter: int (default = 1000)：

求解器收敛的最大迭代次数。

linesearch_max_iter: int (default = 50)：

在 lbfgs 和 owl QN 求解器中使用的每个外部迭代的最大线搜索迭代次数。

verbose：int 或布尔值，默认=False

设置日志记录级别。它必须是 cuml.common.logger.level_* 之一。有关详细信息，请参阅详细级别。

l1_ratio: float or None, optional (default=None)：

Elastic-Net 混合参数，与0 <= l1_ratio <= 1

solver: ‘qn’, ‘lbfgs’, ‘owl’ (default=’qn’).：

用于优化问题的算法。目前仅支持qn，它会根据上述l1正则化的条件自动选择L-BFGS或OWL-QN。选项 ‘lbfgs’ and ‘owl’ 只是方便值，最终使用相同的求解器遵循相同的规则。

handle：cuml.Handle

指定 cuml.handle 保存用于此模型中计算的内部 CUDA 状态。最重要的是，这指定了将用于模型计算的 CUDA 流，因此用户可以通过在多个流中创建句柄在不同的流中同时运行不同的模型。如果为 None，则创建一个新的。

output_type：{‘input’, ‘cudf’, ‘cupy’, ‘numpy’, ‘numba’}，默认=无

用于控制估计器的结果和属性的输出类型的变量。如果为 None，它将继承在模块级别设置的输出类型 cuml.global_settings.output_type 。有关详细信息，请参阅输出数据类型配置。

注意：

cuML 的LogisticRegression 使用与等效的Scikit-learn 不同的求解器，除非没有惩罚并且在Scikit-learn 中使用solver=lbfgs。这可能会导致模型的系数和预测出现(较小的)差异，类似于在Scikit-learn 中使用不同的求解器。

有关其他信息，请参阅Scikit-learn’s LogisticRegression。

例子：

import cudf
import numpy as np

# Both import methods supported
# from cuml import LogisticRegression
from cuml.linear_model import LogisticRegression

X = cudf.DataFrame()
X['col1'] = np.array([1,1,2,2], dtype = np.float32)
X['col2'] = np.array([1,2,2,3], dtype = np.float32)
y = cudf.Series( np.array([0.0, 0.0, 1.0, 1.0], dtype = np.float32) )

reg = LogisticRegression()
reg.fit(X,y)

print("Coefficients:")
print(reg.coef_)
print("Intercept:")
print(reg.intercept_)

X_new = cudf.DataFrame()
X_new['col1'] = np.array([1,5], dtype = np.float32)
X_new['col2'] = np.array([2,5], dtype = np.float32)

preds = reg.predict(X_new)

print("Predictions:")
print(preds)

输出：

Coefficients:
            0.22309814
            0.21012752
Intercept:
            -0.7548761
Predictions:
            0    0.0
            1    1.0

属性：

coef_: dev array, dim (n_classes, n_features) or (n_classes, n_features+1)：: 线性回归模型的估计系数。
intercept_: device array (n_classes, 1)：: 独立术语。如果 fit_intercept 为 False，则为 0。

相关用法

注：本文由纯净天空筛选整理自rapids.ai大神的英文原创作品 cuml.LogisticRegression。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。