当前位置: 首页>>代码示例>>Python>>正文


Python Fetcher.raw_fetch_url方法代码示例

本文整理汇总了Python中fetcher.Fetcher.raw_fetch_url方法的典型用法代码示例。如果您正苦于以下问题:Python Fetcher.raw_fetch_url方法的具体用法?Python Fetcher.raw_fetch_url怎么用?Python Fetcher.raw_fetch_url使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在fetcher.Fetcher的用法示例。


在下文中一共展示了Fetcher.raw_fetch_url方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。

示例1: urls2fetch

# 需要导入模块: from fetcher import Fetcher [as 别名]
# 或者: from fetcher.Fetcher import raw_fetch_url [as 别名]
    def urls2fetch(self, root, helper):
        """ Returns a set of URLs to fetch. If the scraper helper class has
            associated RSS feed URLs, these are used to acquire article URLs.
            Otherwise, the URLs are found by scraping the root website and
            searching for links to subpages. """
        fetch_set = set()
        feeds = helper.feeds

        if feeds:
            for feed_url in feeds:
                logging.info("Fetching feed {0}".format(feed_url))
                try:
                    d = feedparser.parse(feed_url)
                except Exception as e:
                    logging.warning(
                        "Error fetching/parsing feed {0}: {1}".format(feed_url, str(e))
                    )
                    continue

                for entry in d.entries:
                    if entry.link and not helper.skip_rss_entry(entry):
                        fetch_set.add(entry.link)
        else:
            # Fetch the root URL and scrape all child URLs
            # that refer to the same domain suffix
            logging.info("Fetching root {0}".format(root.url))

            # Read the HTML document at the root URL
            html_doc = Fetcher.raw_fetch_url(root.url)
            if not html_doc:
                logging.warning("Unable to fetch root {0}".format(root.url))
                return

            # Parse the HTML document
            soup = Fetcher.make_soup(html_doc)

            # Obtain the set of child URLs to fetch
            fetch_set = Fetcher.children(root, soup)

        return fetch_set
开发者ID:vthorsteinsson,项目名称:Reynir,代码行数:42,代码来源:scraper.py


注:本文中的fetcher.Fetcher.raw_fetch_url方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。