本文简要介绍
pyspark.ml.feature.VarianceThresholdSelector
的用法。用法:
class pyspark.ml.feature.VarianceThresholdSelector(*, featuresCol='features', outputCol=None, varianceThreshold=0.0)
删除所有low-variance 特征的特征选择器。方差不大于阈值的特征将被删除。默认是保留所有具有非零方差的特征,即删除所有样本中具有相同值的特征。
版本 3.1.0 中的新函数。
例子:
>>> from pyspark.ml.linalg import Vectors >>> df = spark.createDataFrame( ... [(Vectors.dense([6.0, 7.0, 0.0, 7.0, 6.0, 0.0]),), ... (Vectors.dense([0.0, 9.0, 6.0, 0.0, 5.0, 9.0]),), ... (Vectors.dense([0.0, 9.0, 3.0, 0.0, 5.0, 5.0]),), ... (Vectors.dense([0.0, 9.0, 8.0, 5.0, 6.0, 4.0]),), ... (Vectors.dense([8.0, 9.0, 6.0, 5.0, 4.0, 4.0]),), ... (Vectors.dense([8.0, 9.0, 6.0, 0.0, 0.0, 0.0]),)], ... ["features"]) >>> selector = VarianceThresholdSelector(varianceThreshold=8.2, outputCol="selectedFeatures") >>> model = selector.fit(df) >>> model.getFeaturesCol() 'features' >>> model.setFeaturesCol("features") VarianceThresholdSelectorModel... >>> model.transform(df).head().selectedFeatures DenseVector([6.0, 7.0, 0.0]) >>> model.selectedFeatures [0, 3, 5] >>> varianceThresholdSelectorPath = temp_path + "/variance-threshold-selector" >>> selector.save(varianceThresholdSelectorPath) >>> loadedSelector = VarianceThresholdSelector.load(varianceThresholdSelectorPath) >>> loadedSelector.getVarianceThreshold() == selector.getVarianceThreshold() True >>> modelPath = temp_path + "/variance-threshold-selector-model" >>> model.save(modelPath) >>> loadedModel = VarianceThresholdSelectorModel.load(modelPath) >>> loadedModel.selectedFeatures == model.selectedFeatures True >>> loadedModel.transform(df).take(1) == model.transform(df).take(1) True
相关用法
- Python pyspark VectorSlicer用法及代码示例
- Python pyspark VectorSizeHint用法及代码示例
- Python pyspark Vectors.stringify用法及代码示例
- Python pyspark Vectors.squared_distance用法及代码示例
- Python pyspark VectorAssembler用法及代码示例
- Python pyspark Vectors.parse用法及代码示例
- Python pyspark Vectors.dense用法及代码示例
- Python pyspark VersionUtils.majorMinorVersion用法及代码示例
- Python pyspark VectorIndexer用法及代码示例
- Python pyspark Vectors.sparse用法及代码示例
- Python pyspark create_map用法及代码示例
- Python pyspark date_add用法及代码示例
- Python pyspark DataFrame.to_latex用法及代码示例
- Python pyspark DataStreamReader.schema用法及代码示例
- Python pyspark MultiIndex.size用法及代码示例
- Python pyspark arrays_overlap用法及代码示例
- Python pyspark Series.asof用法及代码示例
- Python pyspark DataFrame.align用法及代码示例
- Python pyspark Index.is_monotonic_decreasing用法及代码示例
- Python pyspark IsotonicRegression用法及代码示例
- Python pyspark DataFrame.plot.bar用法及代码示例
- Python pyspark DataFrame.to_delta用法及代码示例
- Python pyspark element_at用法及代码示例
- Python pyspark explode用法及代码示例
- Python pyspark MultiIndex.hasnans用法及代码示例
注:本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.ml.feature.VarianceThresholdSelector。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。