本文整理汇总了Python中tokenizer.Tokenizer.ngrams方法的典型用法代码示例。如果您正苦于以下问题:Python Tokenizer.ngrams方法的具体用法?Python Tokenizer.ngrams怎么用?Python Tokenizer.ngrams使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类tokenizer.Tokenizer
的用法示例。
在下文中一共展示了Tokenizer.ngrams方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: dict
# 需要导入模块: from tokenizer import Tokenizer [as 别名]
# 或者: from tokenizer.Tokenizer import ngrams [as 别名]
# number of reviews a token has to appear to be kept
hardthreshold = 2
print "> Loading data"
alltoken = data.loadFile(root + '/computed/alltoken.pkl')
print "> Scanning data"
print "Loading file", filename
reviews_feature = dict()
reviews_score = dict()
tok = Tokenizer(preserve_case=True)
# extracting tokens
for line in data.generateLine(filename):
review = json.loads(line)
reviewid = review['review_id']
text = tok.ngrams(review['text'], 1, 3)
score = int(review['stars'])
# filtering tokens by the ones in the model
text = filter(lambda k: k in alltoken, text)
reviews_feature[reviewid] = Counter(text)
reviews_score[reviewid] = score
print "> End of full scan"
print "> Saving"
data.saveFile(reviews_feature, root + "/computed/reviews_feature.pkl")
data.saveFile(reviews_score, root + "/computed/reviews_score.pkl")