Python pyspark LogisticRegression用法及代码示例

本文简要介绍 pyspark.ml.classification.LogisticRegression 的用法。

用法: class pyspark.ml.classification.LogisticRegression(*, featuresCol='features', labelCol='label', predictionCol='prediction', maxIter=100, regParam=0.0, elasticNetParam=0.0, tol=1e-06, fitIntercept=True, threshold=0.5, thresholds=None, probabilityCol='probability', rawPredictionCol='rawPrediction', standardization=True, weightCol=None, aggregationDepth=2, family='auto', lowerBoundsOnCoefficients=None, upperBoundsOnCoefficients=None, lowerBoundsOnIntercepts=None, upperBoundsOnIntercepts=None, maxBlockSizeInMB=0.0)

逻辑回归。此类支持多项逻辑 (softmax) 和二项逻辑回归。

版本 1.3.0 中的新函数。

例子：

>>> from pyspark.sql import Row
>>> from pyspark.ml.linalg import Vectors
>>> bdf = sc.parallelize([
...     Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)),
...     Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)),
...     Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)),
...     Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 3.0))]).toDF()
>>> blor = LogisticRegression(weightCol="weight")
>>> blor.getRegParam()
0.0
>>> blor.setRegParam(0.01)
LogisticRegression...
>>> blor.getRegParam()
0.01
>>> blor.setMaxIter(10)
LogisticRegression...
>>> blor.getMaxIter()
10
>>> blor.clear(blor.maxIter)
>>> blorModel = blor.fit(bdf)
>>> blorModel.setFeaturesCol("features")
LogisticRegressionModel...
>>> blorModel.setProbabilityCol("newProbability")
LogisticRegressionModel...
>>> blorModel.getProbabilityCol()
'newProbability'
>>> blorModel.getMaxBlockSizeInMB()
0.0
>>> blorModel.setThreshold(0.1)
LogisticRegressionModel...
>>> blorModel.getThreshold()
0.1
>>> blorModel.coefficients
DenseVector([-1.080..., -0.646...])
>>> blorModel.intercept
3.112...
>>> blorModel.evaluate(bdf).accuracy == blorModel.summary.accuracy
True
>>> data_path = "data/mllib/sample_multiclass_classification_data.txt"
>>> mdf = spark.read.format("libsvm").load(data_path)
>>> mlor = LogisticRegression(regParam=0.1, elasticNetParam=1.0, family="multinomial")
>>> mlorModel = mlor.fit(mdf)
>>> mlorModel.coefficientMatrix
SparseMatrix(3, 4, [0, 1, 2, 3], [3, 2, 1], [1.87..., -2.75..., -0.50...], 1)
>>> mlorModel.interceptVector
DenseVector([0.04..., -0.42..., 0.37...])
>>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0, 1.0))]).toDF()
>>> blorModel.predict(test0.head().features)
1.0
>>> blorModel.predictRaw(test0.head().features)
DenseVector([-3.54..., 3.54...])
>>> blorModel.predictProbability(test0.head().features)
DenseVector([0.028, 0.972])
>>> result = blorModel.transform(test0).head()
>>> result.prediction
1.0
>>> result.newProbability
DenseVector([0.02..., 0.97...])
>>> result.rawPrediction
DenseVector([-3.54..., 3.54...])
>>> test1 = sc.parallelize([Row(features=Vectors.sparse(2, [0], [1.0]))]).toDF()
>>> blorModel.transform(test1).head().prediction
1.0
>>> blor.setParams("vector")
Traceback (most recent call last):
    ...
TypeError: Method setParams forces keyword arguments.
>>> lr_path = temp_path + "/lr"
>>> blor.save(lr_path)
>>> lr2 = LogisticRegression.load(lr_path)
>>> lr2.getRegParam()
0.01
>>> model_path = temp_path + "/lr_model"
>>> blorModel.save(model_path)
>>> model2 = LogisticRegressionModel.load(model_path)
>>> blorModel.coefficients[0] == model2.coefficients[0]
True
>>> blorModel.intercept == model2.intercept
True
>>> model2
LogisticRegressionModel: uid=..., numClasses=2, numFeatures=2
>>> blorModel.transform(test0).take(1) == model2.transform(test0).take(1)
True

相关用法

注：本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.ml.classification.LogisticRegression。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。