本文简要介绍
pyspark.ml.classification.LogisticRegression
的用法。用法:
class pyspark.ml.classification.LogisticRegression(*, featuresCol='features', labelCol='label', predictionCol='prediction', maxIter=100, regParam=0.0, elasticNetParam=0.0, tol=1e-06, fitIntercept=True, threshold=0.5, thresholds=None, probabilityCol='probability', rawPredictionCol='rawPrediction', standardization=True, weightCol=None, aggregationDepth=2, family='auto', lowerBoundsOnCoefficients=None, upperBoundsOnCoefficients=None, lowerBoundsOnIntercepts=None, upperBoundsOnIntercepts=None, maxBlockSizeInMB=0.0)
逻辑回归。此类支持多项逻辑 (softmax) 和二项逻辑回归。
版本 1.3.0 中的新函数。
例子:
>>> from pyspark.sql import Row >>> from pyspark.ml.linalg import Vectors >>> bdf = sc.parallelize([ ... Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)), ... Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)), ... Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)), ... Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 3.0))]).toDF() >>> blor = LogisticRegression(weightCol="weight") >>> blor.getRegParam() 0.0 >>> blor.setRegParam(0.01) LogisticRegression... >>> blor.getRegParam() 0.01 >>> blor.setMaxIter(10) LogisticRegression... >>> blor.getMaxIter() 10 >>> blor.clear(blor.maxIter) >>> blorModel = blor.fit(bdf) >>> blorModel.setFeaturesCol("features") LogisticRegressionModel... >>> blorModel.setProbabilityCol("newProbability") LogisticRegressionModel... >>> blorModel.getProbabilityCol() 'newProbability' >>> blorModel.getMaxBlockSizeInMB() 0.0 >>> blorModel.setThreshold(0.1) LogisticRegressionModel... >>> blorModel.getThreshold() 0.1 >>> blorModel.coefficients DenseVector([-1.080..., -0.646...]) >>> blorModel.intercept 3.112... >>> blorModel.evaluate(bdf).accuracy == blorModel.summary.accuracy True >>> data_path = "data/mllib/sample_multiclass_classification_data.txt" >>> mdf = spark.read.format("libsvm").load(data_path) >>> mlor = LogisticRegression(regParam=0.1, elasticNetParam=1.0, family="multinomial") >>> mlorModel = mlor.fit(mdf) >>> mlorModel.coefficientMatrix SparseMatrix(3, 4, [0, 1, 2, 3], [3, 2, 1], [1.87..., -2.75..., -0.50...], 1) >>> mlorModel.interceptVector DenseVector([0.04..., -0.42..., 0.37...]) >>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0, 1.0))]).toDF() >>> blorModel.predict(test0.head().features) 1.0 >>> blorModel.predictRaw(test0.head().features) DenseVector([-3.54..., 3.54...]) >>> blorModel.predictProbability(test0.head().features) DenseVector([0.028, 0.972]) >>> result = blorModel.transform(test0).head() >>> result.prediction 1.0 >>> result.newProbability DenseVector([0.02..., 0.97...]) >>> result.rawPrediction DenseVector([-3.54..., 3.54...]) >>> test1 = sc.parallelize([Row(features=Vectors.sparse(2, [0], [1.0]))]).toDF() >>> blorModel.transform(test1).head().prediction 1.0 >>> blor.setParams("vector") Traceback (most recent call last): ... TypeError: Method setParams forces keyword arguments. >>> lr_path = temp_path + "/lr" >>> blor.save(lr_path) >>> lr2 = LogisticRegression.load(lr_path) >>> lr2.getRegParam() 0.01 >>> model_path = temp_path + "/lr_model" >>> blorModel.save(model_path) >>> model2 = LogisticRegressionModel.load(model_path) >>> blorModel.coefficients[0] == model2.coefficients[0] True >>> blorModel.intercept == model2.intercept True >>> model2 LogisticRegressionModel: uid=..., numClasses=2, numFeatures=2 >>> blorModel.transform(test0).take(1) == model2.transform(test0).take(1) True
相关用法
- Python pyspark LogisticRegressionWithLBFGS.train用法及代码示例
- Python pyspark LogisticRegressionModel用法及代码示例
- Python pyspark LDA.setLearningDecay用法及代码示例
- Python pyspark LDA.setDocConcentration用法及代码示例
- Python pyspark LDA用法及代码示例
- Python pyspark LDAModel用法及代码示例
- Python pyspark LinearRegressionModel用法及代码示例
- Python pyspark LDA.setOptimizer用法及代码示例
- Python pyspark LinearSVC用法及代码示例
- Python pyspark LDA.setK用法及代码示例
- Python pyspark LDA.setLearningOffset用法及代码示例
- Python pyspark LinearRegression用法及代码示例
- Python pyspark LDA.setTopicDistributionCol用法及代码示例
- Python pyspark LassoModel用法及代码示例
- Python pyspark LDA.setKeepLastCheckpoint用法及代码示例
- Python pyspark LDA.setSubsamplingRate用法及代码示例
- Python pyspark LDA.setTopicConcentration用法及代码示例
- Python pyspark LDA.setOptimizeDocConcentration用法及代码示例
- Python pyspark create_map用法及代码示例
- Python pyspark date_add用法及代码示例
- Python pyspark DataFrame.to_latex用法及代码示例
- Python pyspark DataStreamReader.schema用法及代码示例
- Python pyspark MultiIndex.size用法及代码示例
- Python pyspark arrays_overlap用法及代码示例
- Python pyspark Series.asof用法及代码示例
注:本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.ml.classification.LogisticRegression。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。