當前位置: 首頁>>編程示例 >>用法及示例精選 >>正文


Python pyspark VarianceThresholdSelector用法及代碼示例

本文簡要介紹 pyspark.ml.feature.VarianceThresholdSelector 的用法。

用法:

class pyspark.ml.feature.VarianceThresholdSelector(*, featuresCol='features', outputCol=None, varianceThreshold=0.0)

刪除所有low-variance 特征的特征選擇器。方差不大於閾值的特征將被刪除。默認是保留所有具有非零方差的特征,即刪除所有樣本中具有相同值的特征。

版本 3.1.0 中的新函數。

例子

>>> from pyspark.ml.linalg import Vectors
>>> df = spark.createDataFrame(
...    [(Vectors.dense([6.0, 7.0, 0.0, 7.0, 6.0, 0.0]),),
...     (Vectors.dense([0.0, 9.0, 6.0, 0.0, 5.0, 9.0]),),
...     (Vectors.dense([0.0, 9.0, 3.0, 0.0, 5.0, 5.0]),),
...     (Vectors.dense([0.0, 9.0, 8.0, 5.0, 6.0, 4.0]),),
...     (Vectors.dense([8.0, 9.0, 6.0, 5.0, 4.0, 4.0]),),
...     (Vectors.dense([8.0, 9.0, 6.0, 0.0, 0.0, 0.0]),)],
...    ["features"])
>>> selector = VarianceThresholdSelector(varianceThreshold=8.2, outputCol="selectedFeatures")
>>> model = selector.fit(df)
>>> model.getFeaturesCol()
'features'
>>> model.setFeaturesCol("features")
VarianceThresholdSelectorModel...
>>> model.transform(df).head().selectedFeatures
DenseVector([6.0, 7.0, 0.0])
>>> model.selectedFeatures
[0, 3, 5]
>>> varianceThresholdSelectorPath = temp_path + "/variance-threshold-selector"
>>> selector.save(varianceThresholdSelectorPath)
>>> loadedSelector = VarianceThresholdSelector.load(varianceThresholdSelectorPath)
>>> loadedSelector.getVarianceThreshold() == selector.getVarianceThreshold()
True
>>> modelPath = temp_path + "/variance-threshold-selector-model"
>>> model.save(modelPath)
>>> loadedModel = VarianceThresholdSelectorModel.load(modelPath)
>>> loadedModel.selectedFeatures == model.selectedFeatures
True
>>> loadedModel.transform(df).take(1) == model.transform(df).take(1)
True

相關用法


注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.feature.VarianceThresholdSelector。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。