本文簡要介紹
pyspark.ml.fpm.PrefixSpan
的用法。用法:
class pyspark.ml.fpm.PrefixSpan(*, minSupport=0.1, maxPatternLength=10, maxLocalProjDBSize=32000000, sequenceCol='sequence')
用於挖掘頻繁序列模式的並行 PrefixSpan 算法。 J. Pei 等人的 PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth 中說明了 PrefixSpan 算法(請參閱 這裏 )。該類還不是估算器/轉換器,請使用
findFrequentSequentialPatterns()
方法來運行PrefixSpan算法。2.4.0 版中的新函數。
注意:
見Sequential Pattern Mining (Wikipedia)
例子:
>>> from pyspark.ml.fpm import PrefixSpan >>> from pyspark.sql import Row >>> df = sc.parallelize([Row(sequence=[[1, 2], [3]]), ... Row(sequence=[[1], [3, 2], [1, 2]]), ... Row(sequence=[[1, 2], [5]]), ... Row(sequence=[[6]])]).toDF() >>> prefixSpan = PrefixSpan() >>> prefixSpan.getMaxLocalProjDBSize() 32000000 >>> prefixSpan.getSequenceCol() 'sequence' >>> prefixSpan.setMinSupport(0.5) PrefixSpan... >>> prefixSpan.setMaxPatternLength(5) PrefixSpan... >>> prefixSpan.findFrequentSequentialPatterns(df).sort("sequence").show(truncate=False) +----------+----+ |sequence |freq| +----------+----+ |[[1]] |3 | |[[1], [3]]|2 | |[[2]] |3 | |[[2, 1]] |3 | |[[3]] |2 | +----------+----+ ...
相關用法
- Python pyspark PrefixSpanModel用法及代碼示例
- Python pyspark PolynomialExpansion用法及代碼示例
- Python pyspark PandasCogroupedOps.applyInPandas用法及代碼示例
- Python pyspark PowerIterationClustering用法及代碼示例
- Python pyspark PowerIterationClusteringModel用法及代碼示例
- Python pyspark PCA用法及代碼示例
- Python pyspark ParamGridBuilder用法及代碼示例
- Python pyspark create_map用法及代碼示例
- Python pyspark date_add用法及代碼示例
- Python pyspark DataFrame.to_latex用法及代碼示例
- Python pyspark DataStreamReader.schema用法及代碼示例
- Python pyspark MultiIndex.size用法及代碼示例
- Python pyspark arrays_overlap用法及代碼示例
- Python pyspark Series.asof用法及代碼示例
- Python pyspark DataFrame.align用法及代碼示例
- Python pyspark Index.is_monotonic_decreasing用法及代碼示例
- Python pyspark IsotonicRegression用法及代碼示例
- Python pyspark DataFrame.plot.bar用法及代碼示例
- Python pyspark DataFrame.to_delta用法及代碼示例
- Python pyspark element_at用法及代碼示例
- Python pyspark explode用法及代碼示例
- Python pyspark MultiIndex.hasnans用法及代碼示例
- Python pyspark Series.to_frame用法及代碼示例
- Python pyspark DataFrame.quantile用法及代碼示例
- Python pyspark Column.withField用法及代碼示例
注:本文由純淨天空篩選整理自spark.apache.org大神的英文原創作品 pyspark.ml.fpm.PrefixSpan。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。