當前位置: 首頁>>編程示例 >>用法及示例精選 >>正文

Python tf.strings.unicode_split用法及代碼示例

將 input 中的每個字符串拆分為一係列 Unicode 代碼點。

用法

tf.strings.unicode_split(
    input, input_encoding, errors='replace', replacement_char=65533,
    name=None
)

參數

input 一個 N 維度可能參差不齊的 string 張量，形狀為 [D1...DN] 。 N 必須是靜態已知的。
input_encoding 用於解碼每個字符串的 unicode 編碼的字符串名稱。
errors
指定無法使用指示的編碼轉換輸入字符串時的響應。之一：
- 'strict' ：為任何非法子字符串引發異常。
- 'replace' ：用 replacement_char 替換非法子字符串。
- 'ignore' ：跳過非法子串。
replacement_char 當 errors='replace' 時用於代替 input 中的無效子字符串的替換代碼點。
name 操作的名稱(可選)。

返回

一個 N+1 維度 int32 張量，形狀為 [D1...DN, (num_chars)] 。如果input 是標量，則返回的張量為tf.Tensor，否則為tf.RaggedTensor。

result[i1...iN, j] 是 input[i1...iN] 的子字符串，當使用 input_encoding 解碼時，它編碼其第 j 個字符。

例子：

input = [s.encode('utf8') for s in (u'G\xf6\xf6dnight', u'\U0001f60a')]
tf.strings.unicode_split(input, 'UTF-8').to_list()
[[b'G', b'\xc3\xb6', b'\xc3\xb6', b'd', b'n', b'i', b'g', b'h', b't'],
 [b'\xf0\x9f\x98\x8a']]

相關用法

注：本文由純淨天空篩選整理自tensorflow.org大神的英文原創作品 tf.strings.unicode_split。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。