当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


Python pyspark VarianceThresholdSelector用法及代码示例


本文简要介绍 pyspark.ml.feature.VarianceThresholdSelector 的用法。

用法:

class pyspark.ml.feature.VarianceThresholdSelector(*, featuresCol='features', outputCol=None, varianceThreshold=0.0)

删除所有low-variance 特征的特征选择器。方差不大于阈值的特征将被删除。默认是保留所有具有非零方差的特征,即删除所有样本中具有相同值的特征。

版本 3.1.0 中的新函数。

例子

>>> from pyspark.ml.linalg import Vectors
>>> df = spark.createDataFrame(
...    [(Vectors.dense([6.0, 7.0, 0.0, 7.0, 6.0, 0.0]),),
...     (Vectors.dense([0.0, 9.0, 6.0, 0.0, 5.0, 9.0]),),
...     (Vectors.dense([0.0, 9.0, 3.0, 0.0, 5.0, 5.0]),),
...     (Vectors.dense([0.0, 9.0, 8.0, 5.0, 6.0, 4.0]),),
...     (Vectors.dense([8.0, 9.0, 6.0, 5.0, 4.0, 4.0]),),
...     (Vectors.dense([8.0, 9.0, 6.0, 0.0, 0.0, 0.0]),)],
...    ["features"])
>>> selector = VarianceThresholdSelector(varianceThreshold=8.2, outputCol="selectedFeatures")
>>> model = selector.fit(df)
>>> model.getFeaturesCol()
'features'
>>> model.setFeaturesCol("features")
VarianceThresholdSelectorModel...
>>> model.transform(df).head().selectedFeatures
DenseVector([6.0, 7.0, 0.0])
>>> model.selectedFeatures
[0, 3, 5]
>>> varianceThresholdSelectorPath = temp_path + "/variance-threshold-selector"
>>> selector.save(varianceThresholdSelectorPath)
>>> loadedSelector = VarianceThresholdSelector.load(varianceThresholdSelectorPath)
>>> loadedSelector.getVarianceThreshold() == selector.getVarianceThreshold()
True
>>> modelPath = temp_path + "/variance-threshold-selector-model"
>>> model.save(modelPath)
>>> loadedModel = VarianceThresholdSelectorModel.load(modelPath)
>>> loadedModel.selectedFeatures == model.selectedFeatures
True
>>> loadedModel.transform(df).take(1) == model.transform(df).take(1)
True

相关用法


注:本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.ml.feature.VarianceThresholdSelector。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。