本文簡要介紹
pyspark.ml.feature.StopWordsRemover
的用法。用法:
class pyspark.ml.feature.StopWordsRemover(*, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False, locale=None, inputCols=None, outputCols=None)
從輸入中過濾掉停用詞的特征轉換器。從3.0.0開始,
StopWordsRemover
可以通過設置inputCols
參數一次性過濾掉多列。請注意,當同時設置inputCol
和inputCols
參數時,將拋出異常。版本 1.6.0 中的新函數。
注意:
除非顯式將 null 添加到 stopWords,否則輸入數組中的 null 值將被保留。
例子:
>>> df = spark.createDataFrame([(["a", "b", "c"],)], ["text"]) >>> remover = StopWordsRemover(stopWords=["b"]) >>> remover.setInputCol("text") StopWordsRemover... >>> remover.setOutputCol("words") StopWordsRemover... >>> remover.transform(df).head().words == ['a', 'c'] True >>> stopWordsRemoverPath = temp_path + "/stopwords-remover" >>> remover.save(stopWordsRemoverPath) >>> loadedRemover = StopWordsRemover.load(stopWordsRemoverPath) >>> loadedRemover.getStopWords() == remover.getStopWords() True >>> loadedRemover.getCaseSensitive() == remover.getCaseSensitive() True >>> loadedRemover.transform(df).take(1) == remover.transform(df).take(1) True >>> df2 = spark.createDataFrame([(["a", "b", "c"], ["a", "b"])], ["text1", "text2"]) >>> remover2 = StopWordsRemover(stopWords=["b"]) >>> remover2.setInputCols(["text1", "text2"]).setOutputCols(["words1", "words2"]) StopWordsRemover... >>> remover2.transform(df2).show() +---------+------+------+------+ | text1| text2|words1|words2| +---------+------+------+------+ |[a, b, c]|[a, b]|[a, c]| [a]| +---------+------+------+------+ ...
相關用法
- Python pyspark StructType用法及代碼示例
- Python pyspark Statistics.corr用法及代碼示例
- Python pyspark StreamingQueryManager.get用法及代碼示例
- Python pyspark StandardScaler用法及代碼示例
- Python pyspark StructField用法及代碼示例
- Python pyspark StringIndexer用法及代碼示例
- Python pyspark StreamingQueryManager.resetTerminated用法及代碼示例
- Python pyspark Statistics.kolmogorovSmirnovTest用法及代碼示例
- Python pyspark StreamingKMeansModel用法及代碼示例
- Python pyspark Statistics.colStats用法及代碼示例
- Python pyspark StructType.fieldNames用法及代碼示例
- Python pyspark StreamingQueryManager.active用法及代碼示例
- Python pyspark StructType.add用法及代碼示例
- Python pyspark Statistics.chiSqTest用法及代碼示例
- Python pyspark StreamingQuery.explain用法及代碼示例
- Python pyspark Series.asof用法及代碼示例
- Python pyspark Series.to_frame用法及代碼示例
- Python pyspark Series.rsub用法及代碼示例
- Python pyspark Series.mod用法及代碼示例
- Python pyspark Series.str.join用法及代碼示例
- Python pyspark Series.str.startswith用法及代碼示例
- Python pyspark Series.dt.is_quarter_end用法及代碼示例
- Python pyspark Series.dropna用法及代碼示例
- Python pyspark Series.sub用法及代碼示例
- Python pyspark Series.sum用法及代碼示例
注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.feature.StopWordsRemover。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。