当前位置: 首页>>代码示例>>Python>>正文


Python Dataset.build_vocabulary方法代码示例

本文整理汇总了Python中dataset.Dataset.build_vocabulary方法的典型用法代码示例。如果您正苦于以下问题:Python Dataset.build_vocabulary方法的具体用法?Python Dataset.build_vocabulary怎么用?Python Dataset.build_vocabulary使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在dataset.Dataset的用法示例。


在下文中一共展示了Dataset.build_vocabulary方法的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。

示例1: read_data

# 需要导入模块: from dataset import Dataset [as 别名]
# 或者: from dataset.Dataset import build_vocabulary [as 别名]
	def read_data(dirname, return_dataset=False):
		
		ds = Dataset(dirname, Reader.get_classes())
		emails, classes = [], []

		for sentences, email_type in ds.get_text():
			ds.build_vocabulary(sentences)
			emails.append(sentences)
			classes.append(email_type)

		
		# transform word to indices
		emails = [list(map(ds.get_word_indices().get, s)) for s in emails]

		# count how many times a word appear with the ith class
		counts = np.zeros((len(ds.vocabulary), len(set(classes))))
		for i, e in enumerate(emails):
			for w in e:
				counts[w, classes[i]] += 1 


		# emails = ds.bag_of_words(emails) # using bow we dont need counts

		if return_dataset:
			return np.array(emails), np.array(classes), counts, ds
		return np.array(emails), np.array(classes), counts
开发者ID:mtreviso,项目名称:university,代码行数:28,代码来源:reader.py

示例2: read_file_with_dataset

# 需要导入模块: from dataset import Dataset [as 别名]
# 或者: from dataset.Dataset import build_vocabulary [as 别名]
	def read_file_with_dataset(dirname, ds_orig, return_dataset=False):
		
		ds = Dataset(dirname, ['spam', 'ham'])
		emails = []

		for sentences in ds.get_text_file():
			ds.build_vocabulary(sentences)
			emails.append(sentences)

		# transform word to indices
		dic = ds_orig.get_word_indices()
		for i, s in enumerate(emails):
			for j, x in enumerate(s):
				if x in dic:
					emails[i][j] = dic[x]
				else:
					emails[i][j] = random.randint(0, len(dic)-1) # if the word was not seen before, we pick a random one

		print(emails)
		return np.array(emails)
开发者ID:mtreviso,项目名称:university,代码行数:22,代码来源:reader.py


注:本文中的dataset.Dataset.build_vocabulary方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。