Python pyspark StandardScaler用法及代碼示例

本文簡要介紹 pyspark.ml.feature.StandardScaler 的用法。

用法: class pyspark.ml.feature.StandardScaler(*, withMean=False, withStd=True, inputCol=None, outputCol=None)

通過使用訓練集中樣本的列匯總統計數據去除均值並縮放到單位方差來標準化特征。

“unit std” 使用 corrected sample standard deviation 計算，其計算為無偏樣本方差的平方根。

1.4.0 版中的新函數。

例子：

>>> from pyspark.ml.linalg import Vectors
>>> df = spark.createDataFrame([(Vectors.dense([0.0]),), (Vectors.dense([2.0]),)], ["a"])
>>> standardScaler = StandardScaler()
>>> standardScaler.setInputCol("a")
StandardScaler...
>>> standardScaler.setOutputCol("scaled")
StandardScaler...
>>> model = standardScaler.fit(df)
>>> model.getInputCol()
'a'
>>> model.setOutputCol("output")
StandardScalerModel...
>>> model.mean
DenseVector([1.0])
>>> model.std
DenseVector([1.4142])
>>> model.transform(df).collect()[1].output
DenseVector([1.4142])
>>> standardScalerPath = temp_path + "/standard-scaler"
>>> standardScaler.save(standardScalerPath)
>>> loadedStandardScaler = StandardScaler.load(standardScalerPath)
>>> loadedStandardScaler.getWithMean() == standardScaler.getWithMean()
True
>>> loadedStandardScaler.getWithStd() == standardScaler.getWithStd()
True
>>> modelPath = temp_path + "/standard-scaler-model"
>>> model.save(modelPath)
>>> loadedModel = StandardScalerModel.load(modelPath)
>>> loadedModel.std == model.std
True
>>> loadedModel.mean == model.mean
True
>>> loadedModel.transform(df).take(1) == model.transform(df).take(1)
True

相關用法

注：本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.feature.StandardScaler。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。