R dials vocabulary_size 词汇中的标记数量

用于textrecipes::step_tokenize_sentencepiece() 和textrecipes::step_tokenize_bpe() 。

vocabulary_size(range = c(1000L, 32000L), trans = NULL)

range: 一个二元素向量，分别保存最小和最大可能值的默认值。如果指定了转换，这些值应采用转换后的单位。
trans: scales 包中的 trans 对象，例如 scales::log10_trans() 或 scales::reciprocal_trans() 。如果未提供，则使用与 range 中使用的单位相匹配的默认值。如果没有转换，NULL 。

vocabulary_size()
#> # Unique Tokens in Vocabulary (quantitative)
#> Range: [1000, 32000]

相关用法

注：本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Number of tokens in vocabulary。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。