本文整理匯總了Python中scrapy.spiders.CrawlSpider方法的典型用法代碼示例。如果您正苦於以下問題:Python spiders.CrawlSpider方法的具體用法?Python spiders.CrawlSpider怎麽用?Python spiders.CrawlSpider使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類scrapy.spiders
的用法示例。
在下文中一共展示了spiders.CrawlSpider方法的3個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。
示例1: parse_request
# 需要導入模塊: from scrapy import spiders [as 別名]
# 或者: from scrapy.spiders import CrawlSpider [as 別名]
def parse_request(request, spider):
_request = request_to_dict(request, spider=spider)
if not _request['callback']:
_request['callback'] = 'parse'
elif isinstance(spider, CrawlSpider):
rule = request.meta.get('rule')
if rule is not None:
_request['callback'] = spider.rules[rule].callback
clean_headers(_request['headers'], spider.settings)
_meta = {}
for key, value in _request.get('meta').items():
if key != '_autounit':
_meta[key] = parse_object(value, spider)
_request['meta'] = _meta
return _request
示例2: get_filter_attrs
# 需要導入模塊: from scrapy import spiders [as 別名]
# 或者: from scrapy.spiders import CrawlSpider [as 別名]
def get_filter_attrs(spider):
attrs = {'crawler', 'settings', 'start_urls'}
if isinstance(spider, CrawlSpider):
attrs |= {'rules', '_rules'}
return attrs
示例3: parse_start_url
# 需要導入模塊: from scrapy import spiders [as 別名]
# 或者: from scrapy.spiders import CrawlSpider [as 別名]
def parse_start_url(self, response):
"""CrawlSpider默認先從start_url獲取Request,然後回調parse_start_url方法"""
li_list = response.xpath('//*[@id="post_container"]/li')
for li_div in li_list:
link = li_div.xpath('.//div[@class="thumbnail"]/a/@href').extract_first()
yield scrapy.Request(link, callback=self.parse_detail_url)
next_page = response.xpath('//div[@class="pagination"]/a[@class="next"]/@href').extract_first()
if next_page:
yield scrapy.Request(next_page, callback=self.parse_start_url)