Python cudf.core.column.string.StringMethods.ngrams用法及代碼示例

用法: StringMethods.ngrams(n: int = 2, separator: str = '_') → SeriesOrIndex

從一組標記生成n-grams，係列中的每條記錄都被視為一個標記。

您可以使用 Series.str.tokenize() 函數從 Series 實例生成令牌。

參數：

n：int: n-gram 的度數(連續標記的數量)。對於二元組，默認值為 2。
separator：str: 在n-gram 之間使用的分隔符。默認為‘_’。

例子：

>>> import cudf
>>> str_series = cudf.Series(['this is my', 'favorite book'])
>>> str_series = cudf.Series(['this is my', 'favorite book'])
>>> str_series.str.ngrams(2, "_")
0    this is my_favorite book
dtype: object
>>> str_series = cudf.Series(['abc','def','xyz','hhh'])
>>> str_series.str.ngrams(2, "_")
0    abc_def
1    def_xyz
2    xyz_hhh
dtype: object

相關用法

注：本文由純淨天空篩選整理自rapids.ai大神的英文原創作品 cudf.core.column.string.StringMethods.ngrams。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。