本文整理汇总了Python中document.Document.preprocess_text方法的典型用法代码示例。如果您正苦于以下问题:Python Document.preprocess_text方法的具体用法?Python Document.preprocess_text怎么用?Python Document.preprocess_text使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类document.Document
的用法示例。
在下文中一共展示了Document.preprocess_text方法的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: __init__
# 需要导入模块: from document import Document [as 别名]
# 或者: from document.Document import preprocess_text [as 别名]
def __init__(self):
for dir_ in glob(self.master_dir + "/*"):
print "\nProcessing", dir_
for essay in glob(dir_ + "/*"): # essays nested in subdirs
if essay not in self.essay_vectors.keys():
print "\nDoubleChecking", essay
doc = Document(essay, "Wil")
doc.document_to_text(essay, essay) # should probably truncate the first "essay" argument to just the filename
doc.preprocess_text()
doc.statistics()
errors = doc.proofread()
err_stats = {'grammar': 0,
'suggestion': 0,
'spelling': 0
}
try:
for err in errors:
err_stats[err["type"]] += 1
except TypeError:
print "No errors!"
token_sentence_ratio = doc.stats['tokens'] / doc.stats['sentences']
self.essay_vectors[essay] = [
err_stats['grammar'],
err_stats['suggestion'],
err_stats['spelling'],
token_sentence_ratio
]
print "Completed " + essay + ". Sleeping..."
sleep(10)
示例2: test_word_tokenizing
# 需要导入模块: from document import Document [as 别名]
# 或者: from document.Document import preprocess_text [as 别名]
def test_word_tokenizing(self):
text = "This is a test sentence."
with open("../process/tmp_test_file.txt", "w") as test_file:
test_file.write(text)
d = Document("tmp_test_file.txt", "testuser")
d.preprocess_text()
self.assertEqual(d.preprocessed['tokens'], 6, "word tokenizing failed, incorrect number of tokens")