Python pyspark ClusteringEvaluator用法及代碼示例

本文簡要介紹 pyspark.ml.evaluation.ClusteringEvaluator 的用法。

用法: class pyspark.ml.evaluation.ClusteringEvaluator(*, predictionCol='prediction', featuresCol='features', metricName='silhouette', distanceMeasure='squaredEuclidean', weightCol=None)

聚類結果評估器，它需要兩個輸入列：預測和特征。該度量使用平方歐幾裏得距離計算輪廓度量。

輪廓是驗證集群內一致性的一種度量。它的範圍在 1 和 -1 之間，其中接近 1 的值意味著集群中的點靠近同一集群中的其他點，而遠離其他集群的點。

2.3.0 版中的新函數。

例子：

>>> from pyspark.ml.linalg import Vectors
>>> featureAndPredictions = map(lambda x: (Vectors.dense(x[0]), x[1]),
...     [([0.0, 0.5], 0.0), ([0.5, 0.0], 0.0), ([10.0, 11.0], 1.0),
...     ([10.5, 11.5], 1.0), ([1.0, 1.0], 0.0), ([8.0, 6.0], 1.0)])
>>> dataset = spark.createDataFrame(featureAndPredictions, ["features", "prediction"])
...
>>> evaluator = ClusteringEvaluator()
>>> evaluator.setPredictionCol("prediction")
ClusteringEvaluator...
>>> evaluator.evaluate(dataset)
0.9079...
>>> featureAndPredictionsWithWeight = map(lambda x: (Vectors.dense(x[0]), x[1], x[2]),
...     [([0.0, 0.5], 0.0, 2.5), ([0.5, 0.0], 0.0, 2.5), ([10.0, 11.0], 1.0, 2.5),
...     ([10.5, 11.5], 1.0, 2.5), ([1.0, 1.0], 0.0, 2.5), ([8.0, 6.0], 1.0, 2.5)])
>>> dataset = spark.createDataFrame(
...     featureAndPredictionsWithWeight, ["features", "prediction", "weight"])
>>> evaluator = ClusteringEvaluator()
>>> evaluator.setPredictionCol("prediction")
ClusteringEvaluator...
>>> evaluator.setWeightCol("weight")
ClusteringEvaluator...
>>> evaluator.evaluate(dataset)
0.9079...
>>> ce_path = temp_path + "/ce"
>>> evaluator.save(ce_path)
>>> evaluator2 = ClusteringEvaluator.load(ce_path)
>>> str(evaluator2.getPredictionCol())
'prediction'

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.evaluation.ClusteringEvaluator。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。