本文簡要介紹
pyspark.ml.regression.LinearRegression
的用法。用法:
class pyspark.ml.regression.LinearRegression(*, featuresCol='features', labelCol='label', predictionCol='prediction', maxIter=100, regParam=0.0, elasticNetParam=0.0, tol=1e-06, fitIntercept=True, standardization=True, solver='auto', weightCol=None, aggregationDepth=2, loss='squaredError', epsilon=1.35, maxBlockSizeInMB=0.0)
線性回歸。
學習目標是通過正則化最小化指定的損失函數。這支持兩種損失:
squaredError(又稱平方損失)
huber(相對較小誤差的平方誤差和相對較大誤差的絕對誤差的混合體,我們從訓練數據中估計尺度參數)
這支持多種類型的正則化:
無(又名普通最小二乘法)
L2(嶺回歸)
L1(套索)
L2 + L1(彈性網)
1.4.0 版中的新函數。
注意:
使用 huber 損失擬合隻支持 none 和 L2 正則化。
例子:
>>> from pyspark.ml.linalg import Vectors >>> df = spark.createDataFrame([ ... (1.0, 2.0, Vectors.dense(1.0)), ... (0.0, 2.0, Vectors.sparse(1, [], []))], ["label", "weight", "features"]) >>> lr = LinearRegression(regParam=0.0, solver="normal", weightCol="weight") >>> lr.setMaxIter(5) LinearRegression... >>> lr.getMaxIter() 5 >>> lr.setRegParam(0.1) LinearRegression... >>> lr.getRegParam() 0.1 >>> lr.setRegParam(0.0) LinearRegression... >>> model = lr.fit(df) >>> model.setFeaturesCol("features") LinearRegressionModel... >>> model.setPredictionCol("newPrediction") LinearRegressionModel... >>> model.getMaxIter() 5 >>> model.getMaxBlockSizeInMB() 0.0 >>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"]) >>> abs(model.predict(test0.head().features) - (-1.0)) < 0.001 True >>> abs(model.transform(test0).head().newPrediction - (-1.0)) < 0.001 True >>> abs(model.coefficients[0] - 1.0) < 0.001 True >>> abs(model.intercept - 0.0) < 0.001 True >>> test1 = spark.createDataFrame([(Vectors.sparse(1, [0], [1.0]),)], ["features"]) >>> abs(model.transform(test1).head().newPrediction - 1.0) < 0.001 True >>> lr.setParams(featuresCol="vector") LinearRegression... >>> lr_path = temp_path + "/lr" >>> lr.save(lr_path) >>> lr2 = LinearRegression.load(lr_path) >>> lr2.getMaxIter() 5 >>> model_path = temp_path + "/lr_model" >>> model.save(model_path) >>> model2 = LinearRegressionModel.load(model_path) >>> model.coefficients[0] == model2.coefficients[0] True >>> model.intercept == model2.intercept True >>> model.transform(test0).take(1) == model2.transform(test0).take(1) True >>> model.numFeatures 1 >>> model.write().format("pmml").save(model_path + "_2")
相關用法
- Python pyspark LinearRegressionModel用法及代碼示例
- Python pyspark LinearSVC用法及代碼示例
- Python pyspark LDA.setLearningDecay用法及代碼示例
- Python pyspark LogisticRegressionWithLBFGS.train用法及代碼示例
- Python pyspark LDA.setDocConcentration用法及代碼示例
- Python pyspark LDA用法及代碼示例
- Python pyspark LDAModel用法及代碼示例
- Python pyspark LDA.setOptimizer用法及代碼示例
- Python pyspark LDA.setK用法及代碼示例
- Python pyspark LDA.setLearningOffset用法及代碼示例
- Python pyspark LDA.setTopicDistributionCol用法及代碼示例
- Python pyspark LassoModel用法及代碼示例
- Python pyspark LogisticRegressionModel用法及代碼示例
- Python pyspark LogisticRegression用法及代碼示例
- Python pyspark LDA.setKeepLastCheckpoint用法及代碼示例
- Python pyspark LDA.setSubsamplingRate用法及代碼示例
- Python pyspark LDA.setTopicConcentration用法及代碼示例
- Python pyspark LDA.setOptimizeDocConcentration用法及代碼示例
- Python pyspark create_map用法及代碼示例
- Python pyspark date_add用法及代碼示例
- Python pyspark DataFrame.to_latex用法及代碼示例
- Python pyspark DataStreamReader.schema用法及代碼示例
- Python pyspark MultiIndex.size用法及代碼示例
- Python pyspark arrays_overlap用法及代碼示例
- Python pyspark Series.asof用法及代碼示例
注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.regression.LinearRegression。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。