本文简要介绍
pyspark.ml.regression.GBTRegressor
的用法。用法:
class pyspark.ml.regression.GBTRegressor(*, featuresCol='features', labelCol='label', predictionCol='prediction', maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, subsamplingRate=1.0, checkpointInterval=10, lossType='squared', maxIter=20, stepSize=0.1, seed=None, impurity='variance', featureSubsetStrategy='all', validationTol=0.01, validationIndicatorCol=None, leafCol='', minWeightFractionPerNode=0.0, weightCol=None)
Gradient-Boosted Trees (GBTs) 学习回归算法。它支持连续和分类特征。
1.4.0 版中的新函数。
例子:
>>> from numpy import allclose >>> from pyspark.ml.linalg import Vectors >>> df = spark.createDataFrame([ ... (1.0, Vectors.dense(1.0)), ... (0.0, Vectors.sparse(1, [], []))], ["label", "features"]) >>> gbt = GBTRegressor(maxDepth=2, seed=42, leafCol="leafId") >>> gbt.setMaxIter(5) GBTRegressor... >>> gbt.setMinWeightFractionPerNode(0.049) GBTRegressor... >>> gbt.getMaxIter() 5 >>> print(gbt.getImpurity()) variance >>> print(gbt.getFeatureSubsetStrategy()) all >>> model = gbt.fit(df) >>> model.featureImportances SparseVector(1, {0: 1.0}) >>> model.numFeatures 1 >>> allclose(model.treeWeights, [1.0, 0.1, 0.1, 0.1, 0.1]) True >>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"]) >>> model.predict(test0.head().features) 0.0 >>> model.predictLeaf(test0.head().features) DenseVector([0.0, 0.0, 0.0, 0.0, 0.0]) >>> result = model.transform(test0).head() >>> result.prediction 0.0 >>> result.leafId DenseVector([0.0, 0.0, 0.0, 0.0, 0.0]) >>> test1 = spark.createDataFrame([(Vectors.sparse(1, [0], [1.0]),)], ["features"]) >>> model.transform(test1).head().prediction 1.0 >>> gbtr_path = temp_path + "gbtr" >>> gbt.save(gbtr_path) >>> gbt2 = GBTRegressor.load(gbtr_path) >>> gbt2.getMaxDepth() 2 >>> model_path = temp_path + "gbtr_model" >>> model.save(model_path) >>> model2 = GBTRegressionModel.load(model_path) >>> model.featureImportances == model2.featureImportances True >>> model.treeWeights == model2.treeWeights True >>> model.transform(test0).take(1) == model2.transform(test0).take(1) True >>> model.trees [DecisionTreeRegressionModel...depth=..., DecisionTreeRegressionModel...] >>> validation = spark.createDataFrame([(0.0, Vectors.dense(-1.0))], ... ["label", "features"]) >>> model.evaluateEachIteration(validation, "squared") [0.0, 0.0, 0.0, 0.0, 0.0] >>> gbt = gbt.setValidationIndicatorCol("validationIndicator") >>> gbt.getValidationIndicatorCol() 'validationIndicator' >>> gbt.getValidationTol() 0.01
相关用法
- Python pyspark GBTClassifier用法及代码示例
- Python pyspark GroupBy.mean用法及代码示例
- Python pyspark GroupBy.head用法及代码示例
- Python pyspark GroupedData.applyInPandas用法及代码示例
- Python pyspark GroupBy.cumsum用法及代码示例
- Python pyspark GroupBy.rank用法及代码示例
- Python pyspark GaussianMixtureModel用法及代码示例
- Python pyspark GroupBy.bfill用法及代码示例
- Python pyspark GradientBoostedTrees.trainRegressor用法及代码示例
- Python pyspark GroupBy.cummin用法及代码示例
- Python pyspark GroupBy.cummax用法及代码示例
- Python pyspark GroupedData.mean用法及代码示例
- Python pyspark GroupBy.fillna用法及代码示例
- Python pyspark GroupBy.apply用法及代码示例
- Python pyspark GroupedData.agg用法及代码示例
- Python pyspark GroupedData.pivot用法及代码示例
- Python pyspark GroupBy.diff用法及代码示例
- Python pyspark GroupBy.filter用法及代码示例
- Python pyspark GroupBy.transform用法及代码示例
- Python pyspark GroupedData.apply用法及代码示例
- Python pyspark GroupBy.cumcount用法及代码示例
- Python pyspark GroupedData.max用法及代码示例
- Python pyspark GaussianMixture用法及代码示例
- Python pyspark GroupedData.count用法及代码示例
- Python pyspark GroupedData.min用法及代码示例
注:本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.ml.regression.GBTRegressor。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。