本文整理汇总了Python中nltk.tokenize.TreebankWordTokenizer.span_tokenize方法的典型用法代码示例。如果您正苦于以下问题:Python TreebankWordTokenizer.span_tokenize方法的具体用法?Python TreebankWordTokenizer.span_tokenize怎么用?Python TreebankWordTokenizer.span_tokenize使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类nltk.tokenize.TreebankWordTokenizer
的用法示例。
在下文中一共展示了TreebankWordTokenizer.span_tokenize方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: test_treebank_span_tokenizer
# 需要导入模块: from nltk.tokenize import TreebankWordTokenizer [as 别名]
# 或者: from nltk.tokenize.TreebankWordTokenizer import span_tokenize [as 别名]
def test_treebank_span_tokenizer(self):
"""
Test TreebankWordTokenizer.span_tokenize function
"""
tokenizer = TreebankWordTokenizer()
# Test case in the docstring
test1 = "Good muffins cost $3.88\nin New (York). Please (buy) me\ntwo of them.\n(Thanks)."
expected = [
(0, 4), (5, 12), (13, 17), (18, 19), (19, 23),
(24, 26), (27, 30), (31, 32), (32, 36), (36, 37), (37, 38),
(40, 46), (47, 48), (48, 51), (51, 52), (53, 55), (56, 59),
(60, 62), (63, 68), (69, 70), (70, 76), (76, 77), (77, 78)
]
result = tokenizer.span_tokenize(test1)
self.assertEqual(result, expected)
# Test case with double quotation
test2 = "The DUP is similar to the \"religious right\" in the United States and takes a hardline stance on social issues"
expected = [
(0, 3), (4, 7), (8, 10), (11, 18), (19, 21), (22, 25), (26, 27),
(27, 36), (37, 42), (42, 43), (44, 46), (47, 50), (51, 57), (58, 64),
(65, 68), (69, 74), (75, 76), (77, 85), (86, 92), (93, 95), (96, 102),
(103, 109)
]
result = tokenizer.span_tokenize(test2)
self.assertEqual(result, expected)
# Test case with double qoutation as well as converted quotations
test3 = "The DUP is similar to the \"religious right\" in the United States and takes a ``hardline'' stance on social issues"
expected = [
(0, 3), (4, 7), (8, 10), (11, 18), (19, 21), (22, 25), (26, 27),
(27, 36), (37, 42), (42, 43), (44, 46), (47, 50), (51, 57), (58, 64),
(65, 68), (69, 74), (75, 76), (77, 79), (79, 87), (87, 89), (90, 96),
(97, 99), (100, 106), (107, 113)
]
result = tokenizer.span_tokenize(test3)
self.assertEqual(result, expected)