本文整理汇总了Python中readability.Document.content方法的典型用法代码示例。如果您正苦于以下问题:Python Document.content方法的具体用法?Python Document.content怎么用?Python Document.content使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类readability.Document
的用法示例。
在下文中一共展示了Document.content方法的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: parse_item
# 需要导入模块: from readability import Document [as 别名]
# 或者: from readability.Document import content [as 别名]
def parse_item(self, response):
filename = hashlib.sha1(response.url.encode()).hexdigest()
readability_document = Document(response.body, url=response.url)
item = BeerReviewPage()
item['url'] = response.url
item['filename'] = filename
item['depth'] = response.meta['depth']
item['link_text'] = response.meta['link_text']
item['title'] = readability_document.short_title()
with open('data/' + filename + '.html','wb') as html_file:
html_file.write(readability_document.content())
print '(' + filename + ') ' + item['title'] + " : " + item['url']
return item
示例2: extract_content_texts
# 需要导入模块: from readability import Document [as 别名]
# 或者: from readability.Document import content [as 别名]
def extract_content_texts(name):
article_archive = os.path.join(DEFAULT_SAVE_PATH, name, 'raw_articles')
json_archive = os.path.join(DEFAULT_SAVE_PATH, name, 'json_articles')
mkdir_p(json_archive)
for html in glob.glob(article_archive+'/*.html'):
fname = os.path.basename(html)+'.json'
savepath = os.path.join(json_archive, fname)
if os.path.exists(savepath):
logging.info('Skipping existing json data: {0}'.format(savepath))
continue
data = {}
with open(html, 'r') as myfile:
doc = Document(myfile.read())
data['title'] = doc.title()
data['content'] = doc.content()
data['summary'] = doc.summary()
with open(savepath, 'w') as saving:
json.dump(data, saving)