本文整理匯總了Python中fetcher.Fetcher.raw_fetch_url方法的典型用法代碼示例。如果您正苦於以下問題:Python Fetcher.raw_fetch_url方法的具體用法?Python Fetcher.raw_fetch_url怎麽用?Python Fetcher.raw_fetch_url使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類fetcher.Fetcher
的用法示例。
在下文中一共展示了Fetcher.raw_fetch_url方法的1個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。
示例1: urls2fetch
# 需要導入模塊: from fetcher import Fetcher [as 別名]
# 或者: from fetcher.Fetcher import raw_fetch_url [as 別名]
def urls2fetch(self, root, helper):
""" Returns a set of URLs to fetch. If the scraper helper class has
associated RSS feed URLs, these are used to acquire article URLs.
Otherwise, the URLs are found by scraping the root website and
searching for links to subpages. """
fetch_set = set()
feeds = helper.feeds
if feeds:
for feed_url in feeds:
logging.info("Fetching feed {0}".format(feed_url))
try:
d = feedparser.parse(feed_url)
except Exception as e:
logging.warning(
"Error fetching/parsing feed {0}: {1}".format(feed_url, str(e))
)
continue
for entry in d.entries:
if entry.link and not helper.skip_rss_entry(entry):
fetch_set.add(entry.link)
else:
# Fetch the root URL and scrape all child URLs
# that refer to the same domain suffix
logging.info("Fetching root {0}".format(root.url))
# Read the HTML document at the root URL
html_doc = Fetcher.raw_fetch_url(root.url)
if not html_doc:
logging.warning("Unable to fetch root {0}".format(root.url))
return
# Parse the HTML document
soup = Fetcher.make_soup(html_doc)
# Obtain the set of child URLs to fetch
fetch_set = Fetcher.children(root, soup)
return fetch_set