當前位置: 首頁>>代碼示例>>Python>>正文


Python Helper.debug方法代碼示例

本文整理匯總了Python中Helper.Helper.debug方法的典型用法代碼示例。如果您正苦於以下問題:Python Helper.debug方法的具體用法?Python Helper.debug怎麽用?Python Helper.debug使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在Helper.Helper的用法示例。


在下文中一共展示了Helper.debug方法的2個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: process_url

# 需要導入模塊: from Helper import Helper [as 別名]
# 或者: from Helper.Helper import debug [as 別名]
    def process_url(self, url):
        Helper.debug("process start")
        try:
            source = request.urlopen(url).read()
        except:
            return set()
        Helper.debug("process 1:db")
        
        self.db_cache(url, source)

        #db = sqlite3.connect("data/pages.db")
        #cursor = db.cursor()
        #cursor.execute("""SELECT url FROM pages""")
        #all_urls = [''.join(item) for item in cursor.fetchall()]
        #if url in all_urls:
        #    cursor.execute("""
        #        UPDATE pages SET html = ? WHERE url = ? """, (source, url))
        #else:
        #    cursor.execute("""
        #        INSERT INTO pages(url, html) VALUES (?,?)""", (url, source))
        #db.commit()
        #db.close()
        
        Helper.debug("process 2:re")
        # Regex for finding links
        rgx = re.compile('a href="(\/\S+|[\/aA-zZ0-9]\S+\.\S+)"')

        linkMatches = rgx.findall(str(source))

        tempFrontier = set()

        tempFrontier.add(url)
        Helper.debug("process 3:add links")
        if self.frontier.frontQueue.qsize() < 10:
            for link in linkMatches:
                if ('https://' in link or 'http://' in link or link[0] == '/') \
                    and 'ftp.' not in link \
                    and'ftp://' not in link \
                    and 'mailto:' not in link:
                    tempFrontier.add(self.normalize_url(link, Helper.get_domain(url)))
        
        #tempFrontier = tempFrontier - set(self.get_disallowed_sites(url, 'GingerWhiskeyCrawler'))
        Helper.debug("process end")
        return tempFrontier
開發者ID:Roknahr,項目名稱:pyCrawler,代碼行數:46,代碼來源:WebCrawler.py

示例2: get_disallowed_sites

# 需要導入模塊: from Helper import Helper [as 別名]
# 或者: from Helper.Helper import debug [as 別名]
    def get_disallowed_sites(self, url, myAgent):
        Helper.debug("Get disallowed sites 1")

        domain = Helper.get_domain(url)

        if domain in self.robots.keys():
            return self.robots[domain]

        try:
            robot = request.urlopen('http://' + domain + '/robots.txt')
            Helper.debug('    Fetching robots.txt: '+domain)
        except:
            return []

        reAgent = re.compile("User-[aA]gent: *(\S+) *$")
        reDis = re.compile("Disallow: *(/\S*) *$")

        agent = None
        disallowed = {}
        Helper.debug("Get disallowed sites 2")
        for line in robot:
            l = str(line).replace("\\n", "").replace("\\r", "")[:-1]
            if reAgent.findall(l): 
                agent = reAgent.findall(l)[0]
                disallowed[agent] = []
            if reDis.findall(l): 
                if agent in disallowed:
                    disallowed[agent].append(reDis.findall(l)[0])
        Helper.debug("Get disallowed sites 3")    
        result = []
        if myAgent in disallowed:
            for link in disallowed[myAgent]:
                result.append(link)  # self.normalize_url(link, domain))
        if '*' in disallowed:
            for link in disallowed['*']:
                result.append(link)  # self.normalize_url(link, domain))
        Helper.debug("Get disallowed sites 4")
        self.robots[domain] = result
        return result
開發者ID:Roknahr,項目名稱:pyCrawler,代碼行數:41,代碼來源:WebCrawler.py


注:本文中的Helper.Helper.debug方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。