本文簡要介紹
pyspark.ml.feature.StandardScaler
的用法。用法:
class pyspark.ml.feature.StandardScaler(*, withMean=False, withStd=True, inputCol=None, outputCol=None)
通過使用訓練集中樣本的列匯總統計數據去除均值並縮放到單位方差來標準化特征。
“unit std” 使用 corrected sample standard deviation 計算,其計算為無偏樣本方差的平方根。
1.4.0 版中的新函數。
例子:
>>> from pyspark.ml.linalg import Vectors >>> df = spark.createDataFrame([(Vectors.dense([0.0]),), (Vectors.dense([2.0]),)], ["a"]) >>> standardScaler = StandardScaler() >>> standardScaler.setInputCol("a") StandardScaler... >>> standardScaler.setOutputCol("scaled") StandardScaler... >>> model = standardScaler.fit(df) >>> model.getInputCol() 'a' >>> model.setOutputCol("output") StandardScalerModel... >>> model.mean DenseVector([1.0]) >>> model.std DenseVector([1.4142]) >>> model.transform(df).collect()[1].output DenseVector([1.4142]) >>> standardScalerPath = temp_path + "/standard-scaler" >>> standardScaler.save(standardScalerPath) >>> loadedStandardScaler = StandardScaler.load(standardScalerPath) >>> loadedStandardScaler.getWithMean() == standardScaler.getWithMean() True >>> loadedStandardScaler.getWithStd() == standardScaler.getWithStd() True >>> modelPath = temp_path + "/standard-scaler-model" >>> model.save(modelPath) >>> loadedModel = StandardScalerModel.load(modelPath) >>> loadedModel.std == model.std True >>> loadedModel.mean == model.mean True >>> loadedModel.transform(df).take(1) == model.transform(df).take(1) True
相關用法
- Python pyspark Statistics.corr用法及代碼示例
- Python pyspark Statistics.kolmogorovSmirnovTest用法及代碼示例
- Python pyspark Statistics.colStats用法及代碼示例
- Python pyspark Statistics.chiSqTest用法及代碼示例
- Python pyspark StopWordsRemover用法及代碼示例
- Python pyspark StructType用法及代碼示例
- Python pyspark StreamingQueryManager.get用法及代碼示例
- Python pyspark StructField用法及代碼示例
- Python pyspark StringIndexer用法及代碼示例
- Python pyspark StreamingQueryManager.resetTerminated用法及代碼示例
- Python pyspark StreamingKMeansModel用法及代碼示例
- Python pyspark StructType.fieldNames用法及代碼示例
- Python pyspark StreamingQueryManager.active用法及代碼示例
- Python pyspark StructType.add用法及代碼示例
- Python pyspark StreamingQuery.explain用法及代碼示例
- Python pyspark Series.asof用法及代碼示例
- Python pyspark Series.to_frame用法及代碼示例
- Python pyspark Series.rsub用法及代碼示例
- Python pyspark Series.mod用法及代碼示例
- Python pyspark Series.str.join用法及代碼示例
- Python pyspark Series.str.startswith用法及代碼示例
- Python pyspark Series.dt.is_quarter_end用法及代碼示例
- Python pyspark Series.dropna用法及代碼示例
- Python pyspark Series.sub用法及代碼示例
- Python pyspark Series.sum用法及代碼示例
注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.feature.StandardScaler。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。