本文整理汇总了Python中mrjob.emr.EMRJobRunner.get_s3_keys方法的典型用法代码示例。如果您正苦于以下问题:Python EMRJobRunner.get_s3_keys方法的具体用法?Python EMRJobRunner.get_s3_keys怎么用?Python EMRJobRunner.get_s3_keys使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类mrjob.emr.EMRJobRunner
的用法示例。
在下文中一共展示了EMRJobRunner.get_s3_keys方法的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: reducer_init
# 需要导入模块: from mrjob.emr import EMRJobRunner [as 别名]
# 或者: from mrjob.emr.EMRJobRunner import get_s3_keys [as 别名]
def reducer_init(self):
emr = EMRJobRunner(aws_access_key_id=AWS_ACCESS_KEY, aws_secret_access_key=AWS_SECRET_KEY)
idf_parts = emr.get_s3_keys('s3://6885public/jeffchan/term-idfs/')
self.word_to_idf = dict()
for part in idf_parts:
json = part.get_contents_as_string()
for line in StringIO.StringIO(json):
pair = json.loads(line)
self.word_to_idf[pair['term']] = pair['idf']
示例2: reducer_init
# 需要导入模块: from mrjob.emr import EMRJobRunner [as 别名]
# 或者: from mrjob.emr.EMRJobRunner import get_s3_keys [as 别名]
def reducer_init(self):
self.idfs = {}
# Iterate through the files in the bucket provided by the user
if self.options.aws_access_key_id and self.options.aws_secret_access_key:
emr = EMRJobRunner(aws_access_key_id=self.options.aws_access_key_id,
aws_secret_access_key=self.options.aws_secret_access_key)
else:
emr = EMRJobRunner()
for key in emr.get_s3_keys("s3://" + self.options.idf_loc):
# Load the whole file first, then read it line-by-line: otherwise,
# chunks may not be even lines
for line in StringIO(key.get_contents_as_string()):
term_idf = JSONValueProtocol.read(line)[1] # parse the line as a JSON object
self.idfs[term_idf['term']] = term_idf['idf']