当前位置: 首页>>代码示例>>Python>>正文


Python Scraper.set_started_callback方法代码示例

本文整理汇总了Python中scraper.Scraper.set_started_callback方法的典型用法代码示例。如果您正苦于以下问题:Python Scraper.set_started_callback方法的具体用法?Python Scraper.set_started_callback怎么用?Python Scraper.set_started_callback使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在scraper.Scraper的用法示例。


在下文中一共展示了Scraper.set_started_callback方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。

示例1: ScraperWrapper

# 需要导入模块: from scraper import Scraper [as 别名]
# 或者: from scraper.Scraper import set_started_callback [as 别名]
class ScraperWrapper(threading.Thread):

    def __init__(self,address='localhost',exchange='barkingowl',broadcast_interval=5,DEBUG=False):
        """
        __init__() constructor setups up the message bus, inits the thread, and sets up 
        local status variables.
        """

        threading.Thread.__init__(self)

        self.uid = str(uuid.uuid4())
        self.address = address
        self.exchange = exchange
        self.DEBUG=DEBUG
        self.interval = broadcast_interval

        # create scraper instance
        self.scraper = Scraper(uid=self.uid)
        self.scraping = False
        self.scraper_thread = None

        # stop control
        self.stopped = False

        #setup message bus
        self.respcon = pika.BlockingConnection(pika.ConnectionParameters(
                                                           host=self.address))
        self.respchan = self.respcon.channel()
        self.respchan.exchange_declare(exchange=self.exchange,type='fanout')

        self.reqcon = pika.BlockingConnection(pika.ConnectionParameters(host=address))
        self.reqchan = self.reqcon.channel()
        self.reqchan.exchange_declare(exchange=exchange,type='fanout')
        result = self.reqchan.queue_declare(exclusive=True)
        queue_name = result.method.queue
        self.reqchan.queue_bind(exchange=exchange,queue=queue_name)
        self.reqchan.basic_consume(self._reqcallback,queue=queue_name,no_ack=True)

        # start our anouncement of availiability
        threading.Timer(self.interval, self.broadcast_available).start()

        if self.DEBUG:
            print "Scraper Wrapper INIT complete."

    def run(self):
        """
        run() is called by the threading sub system when ScraperWrapper.start() is called.  This function
        sets up all of the call abcks needed, as well as begins consuming on the message bus. 
        """
        # setup call backs
        self.scraper.set_finished_callback(self.scraper_finished_callback)
        self.scraper.set_started_callback(self.scraper_started_callback)
        self.scraper.set_broadcast_document_callback(self.scraper_broadcast_document_callback)

        # broadcast availability
        self.broadcast_available()
        self.reqchan.start_consuming()

    def stop(self):
        """
        stop() is called to stop consuming on the message bus, and to stop the scraper from running.
        """
        #self.scraper.stop()
        #if self.scraper_thread != None:
        #    self.scraper_thread.stop()
        self.reqchan.stop_consuming()
        self.stopped = True

    def reset_scraper(self):
        """
        resetscraper() calls reset() within the Scraper class.  This resets the state of the scraper.
        This should not be called unless the scraper has been stoped.
        """
        self.scraper.reset()

    def broadcast_available(self):
        """
        broadcastavailable() broadcasts a message to the message bus saying the scraper is available
        to be dispatched a new url to begin scraping.
        """

        # make sure we are not currently scraping
        if self.scraper.status['busy'] == False:

            packet = {
                'available_datetime': str(datetime.datetime.now())
            }
            payload = {
                'command': 'scraper_available',
                'source_id': self.uid,
                'destination_id': 'broadcast',
                'message': packet
            }
            jbody = json.dumps(payload)
            self.respchan.basic_publish(exchange=self.exchange,routing_key='',body=jbody)

        # boadcast our simple status to the bus
        self.broadcast_simple_status()

        #
#.........这里部分代码省略.........
开发者ID:reustonium,项目名称:BarkingOwl,代码行数:103,代码来源:scraperwrapper.py


注:本文中的scraper.Scraper.set_started_callback方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。