本文簡要介紹
pyspark.ml.feature.Binarizer
的用法。用法:
class pyspark.ml.feature.Binarizer(*, threshold=0.0, inputCol=None, outputCol=None, thresholds=None, inputCols=None, outputCols=None)
對給定閾值的連續特征列進行二值化。從3.0.0開始,
Binarize
可以通過設置inputCols
參數一次映射多個列。請注意,當同時設置inputCol
和inputCols
參數時,將拋出異常。threshold
參數用於單列使用,thresholds
用於多列使用。1.4.0 版中的新函數。
例子:
>>> df = spark.createDataFrame([(0.5,)], ["values"]) >>> binarizer = Binarizer(threshold=1.0, inputCol="values", outputCol="features") >>> binarizer.setThreshold(1.0) Binarizer... >>> binarizer.setInputCol("values") Binarizer... >>> binarizer.setOutputCol("features") Binarizer... >>> binarizer.transform(df).head().features 0.0 >>> binarizer.setParams(outputCol="freqs").transform(df).head().freqs 0.0 >>> params = {binarizer.threshold: -0.5, binarizer.outputCol: "vector"} >>> binarizer.transform(df, params).head().vector 1.0 >>> binarizerPath = temp_path + "/binarizer" >>> binarizer.save(binarizerPath) >>> loadedBinarizer = Binarizer.load(binarizerPath) >>> loadedBinarizer.getThreshold() == binarizer.getThreshold() True >>> loadedBinarizer.transform(df).take(1) == binarizer.transform(df).take(1) True >>> df2 = spark.createDataFrame([(0.5, 0.3)], ["values1", "values2"]) >>> binarizer2 = Binarizer(thresholds=[0.0, 1.0]) >>> binarizer2.setInputCols(["values1", "values2"]).setOutputCols(["output1", "output2"]) Binarizer... >>> binarizer2.transform(df2).show() +-------+-------+-------+-------+ |values1|values2|output1|output2| +-------+-------+-------+-------+ | 0.5| 0.3| 1.0| 0.0| +-------+-------+-------+-------+ ...
相關用法
- Python pyspark BinaryClassificationEvaluator用法及代碼示例
- Python pyspark BinaryClassificationMetrics用法及代碼示例
- Python pyspark BisectingKMeans用法及代碼示例
- Python pyspark BisectingKMeansModel用法及代碼示例
- Python pyspark BlockMatrix.add用法及代碼示例
- Python pyspark BlockMatrix.colsPerBlock用法及代碼示例
- Python pyspark BlockMatrix.subtract用法及代碼示例
- Python pyspark BlockMatrix.toLocalMatrix用法及代碼示例
- Python pyspark BlockMatrix.toIndexedRowMatrix用法及代碼示例
- Python pyspark BlockMatrix.rowsPerBlock用法及代碼示例
- Python pyspark BlockMatrix.numCols用法及代碼示例
- Python pyspark BlockMatrix.numColBlocks用法及代碼示例
- Python pyspark BlockMatrix.numRowBlocks用法及代碼示例
- Python pyspark Bucketizer用法及代碼示例
- Python pyspark BlockMatrix.toCoordinateMatrix用法及代碼示例
- Python pyspark BucketedRandomProjectionLSH用法及代碼示例
- Python pyspark BlockMatrix.transpose用法及代碼示例
- Python pyspark BlockMatrix.numRows用法及代碼示例
- Python pyspark BlockMatrix.multiply用法及代碼示例
- Python pyspark Broadcast用法及代碼示例
- Python pyspark BlockMatrix.blocks用法及代碼示例
- Python pyspark create_map用法及代碼示例
- Python pyspark date_add用法及代碼示例
- Python pyspark DataFrame.to_latex用法及代碼示例
- Python pyspark DataStreamReader.schema用法及代碼示例
注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.feature.Binarizer。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。