本文简要介绍
pyspark.mllib.regression.RidgeRegressionModel
的用法。用法:
class pyspark.mllib.regression.RidgeRegressionModel(weights, intercept)
由带有 l_2 惩罚项的最小二乘拟合得出的线性回归模型。
0.9.0 版中的新函数。
例子:
>>> from pyspark.mllib.linalg import SparseVector >>> from pyspark.mllib.regression import LabeledPoint >>> data = [ ... LabeledPoint(0.0, [0.0]), ... LabeledPoint(1.0, [1.0]), ... LabeledPoint(3.0, [2.0]), ... LabeledPoint(2.0, [3.0]) ... ] >>> lrm = RidgeRegressionWithSGD.train(sc.parallelize(data), iterations=10, ... initialWeights=np.array([1.0])) >>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5 True >>> abs(lrm.predict(np.array([1.0])) - 1) < 0.5 True >>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5 True >>> abs(lrm.predict(sc.parallelize([[1.0]])).collect()[0] - 1) < 0.5 True >>> import os, tempfile >>> path = tempfile.mkdtemp() >>> lrm.save(sc, path) >>> sameModel = RidgeRegressionModel.load(sc, path) >>> abs(sameModel.predict(np.array([0.0])) - 0) < 0.5 True >>> abs(sameModel.predict(np.array([1.0])) - 1) < 0.5 True >>> abs(sameModel.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5 True >>> from shutil import rmtree >>> try: ... rmtree(path) ... except: ... pass >>> data = [ ... LabeledPoint(0.0, SparseVector(1, {0: 0.0})), ... LabeledPoint(1.0, SparseVector(1, {0: 1.0})), ... LabeledPoint(3.0, SparseVector(1, {0: 2.0})), ... LabeledPoint(2.0, SparseVector(1, {0: 3.0})) ... ] >>> lrm = LinearRegressionWithSGD.train(sc.parallelize(data), iterations=10, ... initialWeights=np.array([1.0])) >>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5 True >>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5 True >>> lrm = RidgeRegressionWithSGD.train(sc.parallelize(data), iterations=10, step=1.0, ... regParam=0.01, miniBatchFraction=1.0, initialWeights=np.array([1.0]), intercept=True, ... validateData=True) >>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5 True >>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5 True
相关用法
- Python pyspark RDD.saveAsTextFile用法及代码示例
- Python pyspark RDD.keyBy用法及代码示例
- Python pyspark RDD.sumApprox用法及代码示例
- Python pyspark RowMatrix.numCols用法及代码示例
- Python pyspark RowMatrix.computePrincipalComponents用法及代码示例
- Python pyspark RDD.lookup用法及代码示例
- Python pyspark RDD.zipWithIndex用法及代码示例
- Python pyspark RDD.sampleByKey用法及代码示例
- Python pyspark Rolling.mean用法及代码示例
- Python pyspark Rolling.max用法及代码示例
- Python pyspark RDD.coalesce用法及代码示例
- Python pyspark RDD.subtract用法及代码示例
- Python pyspark RDD.count用法及代码示例
- Python pyspark RankingEvaluator用法及代码示例
- Python pyspark RandomRDDs.uniformRDD用法及代码示例
- Python pyspark RDD.groupWith用法及代码示例
- Python pyspark RDD.distinct用法及代码示例
- Python pyspark RDD.treeAggregate用法及代码示例
- Python pyspark RowMatrix.computeSVD用法及代码示例
- Python pyspark RowMatrix.multiply用法及代码示例
- Python pyspark RandomForest.trainRegressor用法及代码示例
- Python pyspark RandomRDDs.exponentialRDD用法及代码示例
- Python pyspark RDD.mapPartitionsWithIndex用法及代码示例
- Python pyspark Row.asDict用法及代码示例
- Python pyspark RandomRDDs.gammaRDD用法及代码示例
注:本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.mllib.regression.RidgeRegressionModel。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。