Python tf.keras.preprocessing.text.text_to_word_sequence用法及代码示例

将文本转换为单词序列(或标记)。

用法

tf.keras.preprocessing.text.text_to_word_sequence(
    input_text,
    filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
    lower=True, split=' '
)

input_text 输入文本(字符串)。
filters 要过滤掉的字符列表(或连接)，例如标点符号。默认值：'!"#$%&()*+,-./:;<=>?@[\]^_`{|}~\t\n'，包括基本标点符号、制表符和换行符。
lower 布尔值。是否将输入转换为小写。
split str. 用于分词的分隔符。

此函数将文本字符串转换为单词列表，同时忽略默认情况下包含标点符号的filters。

sample_text = 'This is a sample sentence.'
tf.keras.preprocessing.text.text_to_word_sequence(sample_text)
['this', 'is', 'a', 'sample', 'sentence']

相关用法

注：本文由纯净天空筛选整理自tensorflow.org大神的英文原创作品 tf.keras.preprocessing.text.text_to_word_sequence。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。