本文簡要介紹
pyspark.ml.evaluation.ClusteringEvaluator
的用法。用法:
class pyspark.ml.evaluation.ClusteringEvaluator(*, predictionCol='prediction', featuresCol='features', metricName='silhouette', distanceMeasure='squaredEuclidean', weightCol=None)
聚類結果評估器,它需要兩個輸入列:預測和特征。該度量使用平方歐幾裏得距離計算輪廓度量。
輪廓是驗證集群內一致性的一種度量。它的範圍在 1 和 -1 之間,其中接近 1 的值意味著集群中的點靠近同一集群中的其他點,而遠離其他集群的點。
2.3.0 版中的新函數。
例子:
>>> from pyspark.ml.linalg import Vectors >>> featureAndPredictions = map(lambda x: (Vectors.dense(x[0]), x[1]), ... [([0.0, 0.5], 0.0), ([0.5, 0.0], 0.0), ([10.0, 11.0], 1.0), ... ([10.5, 11.5], 1.0), ([1.0, 1.0], 0.0), ([8.0, 6.0], 1.0)]) >>> dataset = spark.createDataFrame(featureAndPredictions, ["features", "prediction"]) ... >>> evaluator = ClusteringEvaluator() >>> evaluator.setPredictionCol("prediction") ClusteringEvaluator... >>> evaluator.evaluate(dataset) 0.9079... >>> featureAndPredictionsWithWeight = map(lambda x: (Vectors.dense(x[0]), x[1], x[2]), ... [([0.0, 0.5], 0.0, 2.5), ([0.5, 0.0], 0.0, 2.5), ([10.0, 11.0], 1.0, 2.5), ... ([10.5, 11.5], 1.0, 2.5), ([1.0, 1.0], 0.0, 2.5), ([8.0, 6.0], 1.0, 2.5)]) >>> dataset = spark.createDataFrame( ... featureAndPredictionsWithWeight, ["features", "prediction", "weight"]) >>> evaluator = ClusteringEvaluator() >>> evaluator.setPredictionCol("prediction") ClusteringEvaluator... >>> evaluator.setWeightCol("weight") ClusteringEvaluator... >>> evaluator.evaluate(dataset) 0.9079... >>> ce_path = temp_path + "/ce" >>> evaluator.save(ce_path) >>> evaluator2 = ClusteringEvaluator.load(ce_path) >>> str(evaluator2.getPredictionCol()) 'prediction'
相關用法
- Python pyspark Column.withField用法及代碼示例
- Python pyspark Column.eqNullSafe用法及代碼示例
- Python pyspark Column.desc_nulls_first用法及代碼示例
- Python pyspark Column.rlike用法及代碼示例
- Python pyspark Column.substr用法及代碼示例
- Python pyspark Column.when用法及代碼示例
- Python pyspark Column.isNotNull用法及代碼示例
- Python pyspark CoordinateMatrix.entries用法及代碼示例
- Python pyspark CategoricalIndex.categories用法及代碼示例
- Python pyspark Column.bitwiseAND用法及代碼示例
- Python pyspark CategoricalIndex.rename_categories用法及代碼示例
- Python pyspark Column.isNull用法及代碼示例
- Python pyspark CoordinateMatrix.numCols用法及代碼示例
- Python pyspark CategoricalIndex.map用法及代碼示例
- Python pyspark Column.between用法及代碼示例
- Python pyspark CategoricalIndex用法及代碼示例
- Python pyspark CategoricalIndex.as_unordered用法及代碼示例
- Python pyspark Column.contains用法及代碼示例
- Python pyspark ChiSqSelector用法及代碼示例
- Python pyspark CoordinateMatrix.toRowMatrix用法及代碼示例
- Python pyspark Column.cast用法及代碼示例
- Python pyspark Column.like用法及代碼示例
- Python pyspark Column.endswith用法及代碼示例
- Python pyspark CategoricalIndex.remove_categories用法及代碼示例
- Python pyspark CategoricalIndex.remove_unused_categories用法及代碼示例
注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.evaluation.ClusteringEvaluator。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。