Python Fetcher.children方法代碼示例

本文整理匯總了Python中fetcher.Fetcher.children方法的典型用法代碼示例。如果您正苦於以下問題：Python Fetcher.children方法的具體用法？Python Fetcher.children怎麽用？Python Fetcher.children使用的例子？那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類fetcher.Fetcher的用法示例。

在下文中一共展示了Fetcher.children方法的1個代碼示例，這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚，您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: urls2fetch

# 需要導入模塊: from fetcher import Fetcher [as 別名]
# 或者: from fetcher.Fetcher import children [as 別名]
    def urls2fetch(self, root, helper):
        """ Returns a set of URLs to fetch. If the scraper helper class has
            associated RSS feed URLs, these are used to acquire article URLs.
            Otherwise, the URLs are found by scraping the root website and
            searching for links to subpages. """
        fetch_set = set()
        feeds = helper.feeds

        if feeds:
            for feed_url in feeds:
                logging.info("Fetching feed {0}".format(feed_url))
                try:
                    d = feedparser.parse(feed_url)
                except Exception as e:
                    logging.warning(
                        "Error fetching/parsing feed {0}: {1}".format(feed_url, str(e))
                    )
                    continue

                for entry in d.entries:
                    if entry.link and not helper.skip_rss_entry(entry):
                        fetch_set.add(entry.link)
        else:
            # Fetch the root URL and scrape all child URLs
            # that refer to the same domain suffix
            logging.info("Fetching root {0}".format(root.url))

            # Read the HTML document at the root URL
            html_doc = Fetcher.raw_fetch_url(root.url)
            if not html_doc:
                logging.warning("Unable to fetch root {0}".format(root.url))
                return

            # Parse the HTML document
            soup = Fetcher.make_soup(html_doc)

            # Obtain the set of child URLs to fetch
            fetch_set = Fetcher.children(root, soup)

        return fetch_set

開發者ID:vthorsteinsson，項目名稱:Reynir，代碼行數:42，代碼來源:scraper.py

注：本文中的fetcher.Fetcher.children方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台，相關代碼片段篩選自各路編程大神貢獻的開源項目，源碼版權歸原作者所有，傳播和使用請參考對應項目的License；未經允許，請勿轉載。