當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


Python pyspark DecisionTreeRegressor用法及代碼示例


本文簡要介紹 pyspark.ml.regression.DecisionTreeRegressor 的用法。

用法:

class pyspark.ml.regression.DecisionTreeRegressor(*, featuresCol='features', labelCol='label', predictionCol='prediction', maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, impurity='variance', seed=None, varianceCol=None, weightCol=None, leafCol='', minWeightFractionPerNode=0.0)

Decision tree 學習回歸算法。它支持連續和分類特征。

1.4.0 版中的新函數。

例子

>>> from pyspark.ml.linalg import Vectors
>>> df = spark.createDataFrame([
...     (1.0, Vectors.dense(1.0)),
...     (0.0, Vectors.sparse(1, [], []))], ["label", "features"])
>>> dt = DecisionTreeRegressor(maxDepth=2)
>>> dt.setVarianceCol("variance")
DecisionTreeRegressor...
>>> model = dt.fit(df)
>>> model.getVarianceCol()
'variance'
>>> model.setLeafCol("leafId")
DecisionTreeRegressionModel...
>>> model.depth
1
>>> model.numNodes
3
>>> model.featureImportances
SparseVector(1, {0: 1.0})
>>> model.numFeatures
1
>>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"])
>>> model.predict(test0.head().features)
0.0
>>> result = model.transform(test0).head()
>>> result.prediction
0.0
>>> model.predictLeaf(test0.head().features)
0.0
>>> result.leafId
0.0
>>> test1 = spark.createDataFrame([(Vectors.sparse(1, [0], [1.0]),)], ["features"])
>>> model.transform(test1).head().prediction
1.0
>>> dtr_path = temp_path + "/dtr"
>>> dt.save(dtr_path)
>>> dt2 = DecisionTreeRegressor.load(dtr_path)
>>> dt2.getMaxDepth()
2
>>> model_path = temp_path + "/dtr_model"
>>> model.save(model_path)
>>> model2 = DecisionTreeRegressionModel.load(model_path)
>>> model.numNodes == model2.numNodes
True
>>> model.depth == model2.depth
True
>>> model.transform(test1).head().variance
0.0
>>> model.transform(test0).take(1) == model2.transform(test0).take(1)
True
>>> df3 = spark.createDataFrame([
...     (1.0, 0.2, Vectors.dense(1.0)),
...     (1.0, 0.8, Vectors.dense(1.0)),
...     (0.0, 1.0, Vectors.sparse(1, [], []))], ["label", "weight", "features"])
>>> dt3 = DecisionTreeRegressor(maxDepth=2, weightCol="weight", varianceCol="variance")
>>> model3 = dt3.fit(df3)
>>> print(model3.toDebugString)
DecisionTreeRegressionModel...depth=1, numNodes=3...

相關用法


注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.regression.DecisionTreeRegressor。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。