本文整理汇总了Python中DB.addToCrawlQueue方法的典型用法代码示例。如果您正苦于以下问题:Python DB.addToCrawlQueue方法的具体用法?Python DB.addToCrawlQueue怎么用?Python DB.addToCrawlQueue使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类DB
的用法示例。
在下文中一共展示了DB.addToCrawlQueue方法的3个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: processCrawlJob
# 需要导入模块: import DB [as 别名]
# 或者: from DB import addToCrawlQueue [as 别名]
def processCrawlJob(crawlJob):
DB.removeFromCrawlQueue(crawlJob.url)
resp = callAgent(crawlJob)
processAgentResponse(resp)
DB.addToCrawlQueue(crawlJob.url)
crawlJob.success = True
return crawlJob
示例2: tryAddUrlToQueue
# 需要导入模块: import DB [as 别名]
# 或者: from DB import addToCrawlQueue [as 别名]
def tryAddUrlToQueue(url):
if urlAllowed(url):
DB.addToCrawlQueue(url)
示例3: len
# 需要导入模块: import DB [as 别名]
# 或者: from DB import addToCrawlQueue [as 别名]
if len(cr['serverErrors']) > 0 or len(cr['browserErrors']) > 0:
cr['errorsPresent'] = True
def processCrawlJob(crawlJob):
DB.removeFromCrawlQueue(crawlJob.url)
resp = callAgent(crawlJob)
processAgentResponse(resp)
DB.addToCrawlQueue(crawlJob.url)
crawlJob.success = True
return crawlJob
running = True
if __name__ == '__main__':
pool = eventlet.GreenPool(size=4*len(agents))
DB.ensure_indexes()
if not DB.inCrawlQueue(config['startUrl']):
DB.addToCrawlQueue(config['startUrl'])
while running:
for crawlDoc in DB.getCrawlQueue():
if urlAllowed(crawlDoc['url']):
for agent in agents:
job = CrawlJob(agent['name'], agent['url'], crawlDoc['url'])
pool.spawn(processCrawlJob, job)
else:
print "Removing URL: ", crawlDoc['url']
DB.removeFromCrawlQueue(crawlDoc['url'])