Python pyspark LinearRegression用法及代碼示例

本文簡要介紹 pyspark.ml.regression.LinearRegression 的用法。

用法: class pyspark.ml.regression.LinearRegression(*, featuresCol='features', labelCol='label', predictionCol='prediction', maxIter=100, regParam=0.0, elasticNetParam=0.0, tol=1e-06, fitIntercept=True, standardization=True, solver='auto', weightCol=None, aggregationDepth=2, loss='squaredError', epsilon=1.35, maxBlockSizeInMB=0.0)

線性回歸。

學習目標是通過正則化最小化指定的損失函數。這支持兩種損失：

squaredError(又稱平方損失)
huber(相對較小誤差的平方誤差和相對較大誤差的絕對誤差的混合體，我們從訓練數據中估計尺度參數)

這支持多種類型的正則化：

無(又名普通最小二乘法)
L2(嶺回歸)
L1(套索)
L2 + L1(彈性網)

1.4.0 版中的新函數。

注意：

使用 huber 損失擬合隻支持 none 和 L2 正則化。

例子：

>>> from pyspark.ml.linalg import Vectors
>>> df = spark.createDataFrame([
...     (1.0, 2.0, Vectors.dense(1.0)),
...     (0.0, 2.0, Vectors.sparse(1, [], []))], ["label", "weight", "features"])
>>> lr = LinearRegression(regParam=0.0, solver="normal", weightCol="weight")
>>> lr.setMaxIter(5)
LinearRegression...
>>> lr.getMaxIter()
5
>>> lr.setRegParam(0.1)
LinearRegression...
>>> lr.getRegParam()
0.1
>>> lr.setRegParam(0.0)
LinearRegression...
>>> model = lr.fit(df)
>>> model.setFeaturesCol("features")
LinearRegressionModel...
>>> model.setPredictionCol("newPrediction")
LinearRegressionModel...
>>> model.getMaxIter()
5
>>> model.getMaxBlockSizeInMB()
0.0
>>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"])
>>> abs(model.predict(test0.head().features) - (-1.0)) < 0.001
True
>>> abs(model.transform(test0).head().newPrediction - (-1.0)) < 0.001
True
>>> abs(model.coefficients[0] - 1.0) < 0.001
True
>>> abs(model.intercept - 0.0) < 0.001
True
>>> test1 = spark.createDataFrame([(Vectors.sparse(1, [0], [1.0]),)], ["features"])
>>> abs(model.transform(test1).head().newPrediction - 1.0) < 0.001
True
>>> lr.setParams(featuresCol="vector")
LinearRegression...
>>> lr_path = temp_path + "/lr"
>>> lr.save(lr_path)
>>> lr2 = LinearRegression.load(lr_path)
>>> lr2.getMaxIter()
5
>>> model_path = temp_path + "/lr_model"
>>> model.save(model_path)
>>> model2 = LinearRegressionModel.load(model_path)
>>> model.coefficients[0] == model2.coefficients[0]
True
>>> model.intercept == model2.intercept
True
>>> model.transform(test0).take(1) == model2.transform(test0).take(1)
True
>>> model.numFeatures
1
>>> model.write().format("pmml").save(model_path + "_2")

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.regression.LinearRegression。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。