本文整理汇总了Python中scrapy.spiders.CrawlSpider方法的典型用法代码示例。如果您正苦于以下问题:Python spiders.CrawlSpider方法的具体用法?Python spiders.CrawlSpider怎么用?Python spiders.CrawlSpider使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类scrapy.spiders
的用法示例。
在下文中一共展示了spiders.CrawlSpider方法的3个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: parse_request
# 需要导入模块: from scrapy import spiders [as 别名]
# 或者: from scrapy.spiders import CrawlSpider [as 别名]
def parse_request(request, spider):
_request = request_to_dict(request, spider=spider)
if not _request['callback']:
_request['callback'] = 'parse'
elif isinstance(spider, CrawlSpider):
rule = request.meta.get('rule')
if rule is not None:
_request['callback'] = spider.rules[rule].callback
clean_headers(_request['headers'], spider.settings)
_meta = {}
for key, value in _request.get('meta').items():
if key != '_autounit':
_meta[key] = parse_object(value, spider)
_request['meta'] = _meta
return _request
示例2: get_filter_attrs
# 需要导入模块: from scrapy import spiders [as 别名]
# 或者: from scrapy.spiders import CrawlSpider [as 别名]
def get_filter_attrs(spider):
attrs = {'crawler', 'settings', 'start_urls'}
if isinstance(spider, CrawlSpider):
attrs |= {'rules', '_rules'}
return attrs
示例3: parse_start_url
# 需要导入模块: from scrapy import spiders [as 别名]
# 或者: from scrapy.spiders import CrawlSpider [as 别名]
def parse_start_url(self, response):
"""CrawlSpider默认先从start_url获取Request,然后回调parse_start_url方法"""
li_list = response.xpath('//*[@id="post_container"]/li')
for li_div in li_list:
link = li_div.xpath('.//div[@class="thumbnail"]/a/@href').extract_first()
yield scrapy.Request(link, callback=self.parse_detail_url)
next_page = response.xpath('//div[@class="pagination"]/a[@class="next"]/@href').extract_first()
if next_page:
yield scrapy.Request(next_page, callback=self.parse_start_url)