當前位置: 首頁>>代碼示例>>Python>>正文


Python Document.content方法代碼示例

本文整理匯總了Python中readability.Document.content方法的典型用法代碼示例。如果您正苦於以下問題:Python Document.content方法的具體用法?Python Document.content怎麽用?Python Document.content使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在readability.Document的用法示例。


在下文中一共展示了Document.content方法的2個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: parse_item

# 需要導入模塊: from readability import Document [as 別名]
# 或者: from readability.Document import content [as 別名]
 def parse_item(self, response):
     filename = hashlib.sha1(response.url.encode()).hexdigest()
     readability_document = Document(response.body, url=response.url)
     item = BeerReviewPage()
     item['url'] = response.url
     item['filename'] = filename
     item['depth'] = response.meta['depth']
     item['link_text'] = response.meta['link_text']
     item['title'] = readability_document.short_title()
     with open('data/' + filename + '.html','wb') as html_file:
         html_file.write(readability_document.content())
     print '(' + filename + ') ' + item['title'] + " : " + item['url']
     return item
開發者ID:anoras,項目名稱:BeerGeek,代碼行數:15,代碼來源:BeerGeekSpider.py

示例2: extract_content_texts

# 需要導入模塊: from readability import Document [as 別名]
# 或者: from readability.Document import content [as 別名]
def extract_content_texts(name):
    article_archive = os.path.join(DEFAULT_SAVE_PATH, name, 'raw_articles')
    json_archive = os.path.join(DEFAULT_SAVE_PATH, name, 'json_articles')
    mkdir_p(json_archive)
    for html in glob.glob(article_archive+'/*.html'):
        fname = os.path.basename(html)+'.json'
        savepath = os.path.join(json_archive, fname)
        if os.path.exists(savepath):
            logging.info('Skipping existing json data: {0}'.format(savepath))
            continue
        data = {}
        with open(html, 'r') as myfile:
            doc = Document(myfile.read())
            data['title'] = doc.title()
            data['content'] = doc.content()
            data['summary'] = doc.summary()
            with open(savepath, 'w') as saving:
                json.dump(data, saving)
開發者ID:gregjan,項目名稱:bullshit-detector,代碼行數:20,代碼來源:wbm_api.py


注:本文中的readability.Document.content方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。