本文简要介绍
pyspark.ml.feature.Binarizer
的用法。用法:
class pyspark.ml.feature.Binarizer(*, threshold=0.0, inputCol=None, outputCol=None, thresholds=None, inputCols=None, outputCols=None)
对给定阈值的连续特征列进行二值化。从3.0.0开始,
Binarize
可以通过设置inputCols
参数一次映射多个列。请注意,当同时设置inputCol
和inputCols
参数时,将抛出异常。threshold
参数用于单列使用,thresholds
用于多列使用。1.4.0 版中的新函数。
例子:
>>> df = spark.createDataFrame([(0.5,)], ["values"]) >>> binarizer = Binarizer(threshold=1.0, inputCol="values", outputCol="features") >>> binarizer.setThreshold(1.0) Binarizer... >>> binarizer.setInputCol("values") Binarizer... >>> binarizer.setOutputCol("features") Binarizer... >>> binarizer.transform(df).head().features 0.0 >>> binarizer.setParams(outputCol="freqs").transform(df).head().freqs 0.0 >>> params = {binarizer.threshold: -0.5, binarizer.outputCol: "vector"} >>> binarizer.transform(df, params).head().vector 1.0 >>> binarizerPath = temp_path + "/binarizer" >>> binarizer.save(binarizerPath) >>> loadedBinarizer = Binarizer.load(binarizerPath) >>> loadedBinarizer.getThreshold() == binarizer.getThreshold() True >>> loadedBinarizer.transform(df).take(1) == binarizer.transform(df).take(1) True >>> df2 = spark.createDataFrame([(0.5, 0.3)], ["values1", "values2"]) >>> binarizer2 = Binarizer(thresholds=[0.0, 1.0]) >>> binarizer2.setInputCols(["values1", "values2"]).setOutputCols(["output1", "output2"]) Binarizer... >>> binarizer2.transform(df2).show() +-------+-------+-------+-------+ |values1|values2|output1|output2| +-------+-------+-------+-------+ | 0.5| 0.3| 1.0| 0.0| +-------+-------+-------+-------+ ...
相关用法
- Python pyspark BinaryClassificationEvaluator用法及代码示例
- Python pyspark BinaryClassificationMetrics用法及代码示例
- Python pyspark BisectingKMeans用法及代码示例
- Python pyspark BisectingKMeansModel用法及代码示例
- Python pyspark BlockMatrix.add用法及代码示例
- Python pyspark BlockMatrix.colsPerBlock用法及代码示例
- Python pyspark BlockMatrix.subtract用法及代码示例
- Python pyspark BlockMatrix.toLocalMatrix用法及代码示例
- Python pyspark BlockMatrix.toIndexedRowMatrix用法及代码示例
- Python pyspark BlockMatrix.rowsPerBlock用法及代码示例
- Python pyspark BlockMatrix.numCols用法及代码示例
- Python pyspark BlockMatrix.numColBlocks用法及代码示例
- Python pyspark BlockMatrix.numRowBlocks用法及代码示例
- Python pyspark Bucketizer用法及代码示例
- Python pyspark BlockMatrix.toCoordinateMatrix用法及代码示例
- Python pyspark BucketedRandomProjectionLSH用法及代码示例
- Python pyspark BlockMatrix.transpose用法及代码示例
- Python pyspark BlockMatrix.numRows用法及代码示例
- Python pyspark BlockMatrix.multiply用法及代码示例
- Python pyspark Broadcast用法及代码示例
- Python pyspark BlockMatrix.blocks用法及代码示例
- Python pyspark create_map用法及代码示例
- Python pyspark date_add用法及代码示例
- Python pyspark DataFrame.to_latex用法及代码示例
- Python pyspark DataStreamReader.schema用法及代码示例
注:本文由纯净天空筛选整理自spark.apache.org大神的英文原创作品 pyspark.ml.feature.Binarizer。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。