当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


Python pyspark RandomForestRegressor用法及代码示例


本文简要介绍 pyspark.ml.regression.RandomForestRegressor 的用法。

用法:

class pyspark.ml.regression.RandomForestRegressor(*, featuresCol='features', labelCol='label', predictionCol='prediction', maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, impurity='variance', subsamplingRate=1.0, seed=None, numTrees=20, featureSubsetStrategy='auto', leafCol='', minWeightFractionPerNode=0.0, weightCol=None, bootstrap=True)

Random Forest 学习回归算法。它支持连续和分类特征。

1.4.0 版中的新函数。

例子

>>> from numpy import allclose
>>> from pyspark.ml.linalg import Vectors
>>> df = spark.createDataFrame([
...     (1.0, Vectors.dense(1.0)),
...     (0.0, Vectors.sparse(1, [], []))], ["label", "features"])
>>> rf = RandomForestRegressor(numTrees=2, maxDepth=2)
>>> rf.getMinWeightFractionPerNode()
0.0
>>> rf.setSeed(42)
RandomForestRegressor...
>>> model = rf.fit(df)
>>> model.getBootstrap()
True
>>> model.getSeed()
42
>>> model.setLeafCol("leafId")
RandomForestRegressionModel...
>>> model.featureImportances
SparseVector(1, {0: 1.0})
>>> allclose(model.treeWeights, [1.0, 1.0])
True
>>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"])
>>> model.predict(test0.head().features)
0.0
>>> model.predictLeaf(test0.head().features)
DenseVector([0.0, 0.0])
>>> result = model.transform(test0).head()
>>> result.prediction
0.0
>>> result.leafId
DenseVector([0.0, 0.0])
>>> model.numFeatures
1
>>> model.trees
[DecisionTreeRegressionModel...depth=..., DecisionTreeRegressionModel...]
>>> model.getNumTrees
2
>>> test1 = spark.createDataFrame([(Vectors.sparse(1, [0], [1.0]),)], ["features"])
>>> model.transform(test1).head().prediction
0.5
>>> rfr_path = temp_path + "/rfr"
>>> rf.save(rfr_path)
>>> rf2 = RandomForestRegressor.load(rfr_path)
>>> rf2.getNumTrees()
2
>>> model_path = temp_path + "/rfr_model"
>>> model.save(model_path)
>>> model2 = RandomForestRegressionModel.load(model_path)
>>> model.featureImportances == model2.featureImportances
True
>>> model.transform(test0).take(1) == model2.transform(test0).take(1)
True

相关用法


注:本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.ml.regression.RandomForestRegressor。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。