本文簡要介紹
pyspark.ml.feature.VarianceThresholdSelector
的用法。用法:
class pyspark.ml.feature.VarianceThresholdSelector(*, featuresCol='features', outputCol=None, varianceThreshold=0.0)
刪除所有low-variance 特征的特征選擇器。方差不大於閾值的特征將被刪除。默認是保留所有具有非零方差的特征,即刪除所有樣本中具有相同值的特征。
版本 3.1.0 中的新函數。
例子:
>>> from pyspark.ml.linalg import Vectors >>> df = spark.createDataFrame( ... [(Vectors.dense([6.0, 7.0, 0.0, 7.0, 6.0, 0.0]),), ... (Vectors.dense([0.0, 9.0, 6.0, 0.0, 5.0, 9.0]),), ... (Vectors.dense([0.0, 9.0, 3.0, 0.0, 5.0, 5.0]),), ... (Vectors.dense([0.0, 9.0, 8.0, 5.0, 6.0, 4.0]),), ... (Vectors.dense([8.0, 9.0, 6.0, 5.0, 4.0, 4.0]),), ... (Vectors.dense([8.0, 9.0, 6.0, 0.0, 0.0, 0.0]),)], ... ["features"]) >>> selector = VarianceThresholdSelector(varianceThreshold=8.2, outputCol="selectedFeatures") >>> model = selector.fit(df) >>> model.getFeaturesCol() 'features' >>> model.setFeaturesCol("features") VarianceThresholdSelectorModel... >>> model.transform(df).head().selectedFeatures DenseVector([6.0, 7.0, 0.0]) >>> model.selectedFeatures [0, 3, 5] >>> varianceThresholdSelectorPath = temp_path + "/variance-threshold-selector" >>> selector.save(varianceThresholdSelectorPath) >>> loadedSelector = VarianceThresholdSelector.load(varianceThresholdSelectorPath) >>> loadedSelector.getVarianceThreshold() == selector.getVarianceThreshold() True >>> modelPath = temp_path + "/variance-threshold-selector-model" >>> model.save(modelPath) >>> loadedModel = VarianceThresholdSelectorModel.load(modelPath) >>> loadedModel.selectedFeatures == model.selectedFeatures True >>> loadedModel.transform(df).take(1) == model.transform(df).take(1) True
相關用法
- Python pyspark VectorSlicer用法及代碼示例
- Python pyspark VectorSizeHint用法及代碼示例
- Python pyspark Vectors.stringify用法及代碼示例
- Python pyspark Vectors.squared_distance用法及代碼示例
- Python pyspark VectorAssembler用法及代碼示例
- Python pyspark Vectors.parse用法及代碼示例
- Python pyspark Vectors.dense用法及代碼示例
- Python pyspark VersionUtils.majorMinorVersion用法及代碼示例
- Python pyspark VectorIndexer用法及代碼示例
- Python pyspark Vectors.sparse用法及代碼示例
- Python pyspark create_map用法及代碼示例
- Python pyspark date_add用法及代碼示例
- Python pyspark DataFrame.to_latex用法及代碼示例
- Python pyspark DataStreamReader.schema用法及代碼示例
- Python pyspark MultiIndex.size用法及代碼示例
- Python pyspark arrays_overlap用法及代碼示例
- Python pyspark Series.asof用法及代碼示例
- Python pyspark DataFrame.align用法及代碼示例
- Python pyspark Index.is_monotonic_decreasing用法及代碼示例
- Python pyspark IsotonicRegression用法及代碼示例
- Python pyspark DataFrame.plot.bar用法及代碼示例
- Python pyspark DataFrame.to_delta用法及代碼示例
- Python pyspark element_at用法及代碼示例
- Python pyspark explode用法及代碼示例
- Python pyspark MultiIndex.hasnans用法及代碼示例
注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.feature.VarianceThresholdSelector。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。