Python TreebankWordTokenizer.span_tokenize方法代码示例

本文整理汇总了Python中nltk.tokenize.TreebankWordTokenizer.span_tokenize方法的典型用法代码示例。如果您正苦于以下问题：Python TreebankWordTokenizer.span_tokenize方法的具体用法？Python TreebankWordTokenizer.span_tokenize怎么用？Python TreebankWordTokenizer.span_tokenize使用的例子？那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类nltk.tokenize.TreebankWordTokenizer的用法示例。

在下文中一共展示了TreebankWordTokenizer.span_tokenize方法的1个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于系统推荐出更棒的Python代码示例。

示例1: test_treebank_span_tokenizer

# 需要导入模块: from nltk.tokenize import TreebankWordTokenizer [as 别名]
# 或者: from nltk.tokenize.TreebankWordTokenizer import span_tokenize [as 别名]
    def test_treebank_span_tokenizer(self):
        """
        Test TreebankWordTokenizer.span_tokenize function
        """

        tokenizer = TreebankWordTokenizer()

        # Test case in the docstring
        test1 = "Good muffins cost $3.88\nin New (York).  Please (buy) me\ntwo of them.\n(Thanks)."
        expected = [
            (0, 4), (5, 12), (13, 17), (18, 19), (19, 23),
            (24, 26), (27, 30), (31, 32), (32, 36), (36, 37), (37, 38),
            (40, 46), (47, 48), (48, 51), (51, 52), (53, 55), (56, 59),
            (60, 62), (63, 68), (69, 70), (70, 76), (76, 77), (77, 78)
        ]
        result = tokenizer.span_tokenize(test1)
        self.assertEqual(result, expected)

        # Test case with double quotation
        test2 = "The DUP is similar to the \"religious right\" in the United States and takes a hardline stance on social issues"
        expected = [
            (0, 3), (4, 7), (8, 10), (11, 18), (19, 21), (22, 25), (26, 27),
            (27, 36), (37, 42), (42, 43), (44, 46), (47, 50), (51, 57), (58, 64),
            (65, 68), (69, 74), (75, 76), (77, 85), (86, 92), (93, 95), (96, 102),
            (103, 109)
        ]
        result = tokenizer.span_tokenize(test2)
        self.assertEqual(result, expected)

        # Test case with double qoutation as well as converted quotations
        test3 = "The DUP is similar to the \"religious right\" in the United States and takes a ``hardline'' stance on social issues"
        expected = [
            (0, 3), (4, 7), (8, 10), (11, 18), (19, 21), (22, 25), (26, 27),
            (27, 36), (37, 42), (42, 43), (44, 46), (47, 50), (51, 57), (58, 64),
            (65, 68), (69, 74), (75, 76), (77, 79), (79, 87), (87, 89), (90, 96),
            (97, 99), (100, 106), (107, 113)
        ]
        result = tokenizer.span_tokenize(test3)
        self.assertEqual(result, expected)

开发者ID:alpaco42，项目名称:ML_Spring_2018，代码行数:41，代码来源:test_tokenize.py

注：本文中的nltk.tokenize.TreebankWordTokenizer.span_tokenize方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。