本文整理汇总了Python中DB.inCrawlQueue方法的典型用法代码示例。如果您正苦于以下问题:Python DB.inCrawlQueue方法的具体用法?Python DB.inCrawlQueue怎么用?Python DB.inCrawlQueue使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类DB
的用法示例。
在下文中一共展示了DB.inCrawlQueue方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: len
# 需要导入模块: import DB [as 别名]
# 或者: from DB import inCrawlQueue [as 别名]
if len(cr['serverErrors']) > 0 or len(cr['browserErrors']) > 0:
cr['errorsPresent'] = True
def processCrawlJob(crawlJob):
DB.removeFromCrawlQueue(crawlJob.url)
resp = callAgent(crawlJob)
processAgentResponse(resp)
DB.addToCrawlQueue(crawlJob.url)
crawlJob.success = True
return crawlJob
running = True
if __name__ == '__main__':
pool = eventlet.GreenPool(size=4*len(agents))
DB.ensure_indexes()
if not DB.inCrawlQueue(config['startUrl']):
DB.addToCrawlQueue(config['startUrl'])
while running:
for crawlDoc in DB.getCrawlQueue():
if urlAllowed(crawlDoc['url']):
for agent in agents:
job = CrawlJob(agent['name'], agent['url'], crawlDoc['url'])
pool.spawn(processCrawlJob, job)
else:
print "Removing URL: ", crawlDoc['url']
DB.removeFromCrawlQueue(crawlDoc['url'])