本文整理汇总了Python中corpus.Corpus.add_to_corpus方法的典型用法代码示例。如果您正苦于以下问题:Python Corpus.add_to_corpus方法的具体用法?Python Corpus.add_to_corpus怎么用?Python Corpus.add_to_corpus使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类corpus.Corpus
的用法示例。
在下文中一共展示了Corpus.add_to_corpus方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: create_dict
# 需要导入模块: from corpus import Corpus [as 别名]
# 或者: from corpus.Corpus import add_to_corpus [as 别名]
def create_dict():
'''
Using parsed unicode files stored in the corpus folder, adds words from each to corpus object for use later
'''
corpus = Corpus()
CORPUS_DIR = "./corpus"
files = os.listdir(CORPUS_DIR)
print(files)
#only keep unicode files
files = [file_ for file_ in files if ".unicode" in file_]
#counter
num_files = len(files)
current = 1
print("\nAdding " + str(num_files) + " total files.")
#iterate over all xml files in directory and process
for file_ in files:
print("Adding " + file_ + " (" + str(current) + " of " + str(num_files) + ") to corpus")
current += 1
#if the pre-processed unicode file exits, add to corpus
if os.path.exists(os.path.join(CORPUS_DIR, file_)):
unicode_ = open(os.path.join(CORPUS_DIR, file_)).read().decode("utf-8")
#split file and add words
for word in unicode_.split(" "):
corpus.add_to_corpus(word)
print("Corpus successfully built. Saving corpus to corpus.pickle")
file_ = open("./corpus.pickle","w")
pickle.dump(corpus,file_)