Python html_parser.HTMLParser方法代碼示例

本文整理匯總了Python中six.moves.html_parser.HTMLParser方法的典型用法代碼示例。如果您正苦於以下問題：Python html_parser.HTMLParser方法的具體用法？Python html_parser.HTMLParser怎麽用？Python html_parser.HTMLParser使用的例子？那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類six.moves.html_parser的用法示例。

在下文中一共展示了html_parser.HTMLParser方法的9個代碼示例，這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚，您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: parse

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def parse(self, response):
        soup = BeautifulSoup(
            response.content.decode('utf-8', 'ignore'), 'lxml')
        image_divs = soup.find_all('div', class_='imgpt')
        pattern = re.compile(r'murl\":\"(.*?)\.jpg')
        for div in image_divs:
            href_str = html_parser.HTMLParser().unescape(div.a['m'])
            match = pattern.search(href_str)
            if match:
                name = (match.group(1)
                        if six.PY3 else match.group(1).encode('utf-8'))
                img_url = '{}.jpg'.format(name)
                yield dict(file_url=img_url)

開發者ID:hellock，項目名稱:icrawler，代碼行數:15，代碼來源:bing.py

示例2: init

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def __init__(self):
        html_parser.HTMLParser.__init__(self)
        self.recording = 0
        self.data = []

開發者ID:ScottAI，項目名稱:-Odoo---，代碼行數:6，代碼來源:helper.py

示例3: message

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def message(self, msg):
        """Process incoming message stanzas.

        Be aware that this also includes MUC messages and error messages. It is
        usually a good idea to check the messages's type before processing or
        sending replies. If the message is the appropriate type, then the bot
        checks wikipedia to see if the message string exists as a page on the
        site. If so, it sends this link back to the sender in the reply.

        Arguments:
            msg -- The received message stanza. See the SleekXMPP documentation
                for stanza objects and the Message stanza to see how it may be
                used.
        """
        if msg['type'] in ('chat', 'normal'):
            msg_body = msg['body']
            encoded_body = urllib.quote_plus(msg_body)
            response = requests.get(
                'https://en.wikipedia.org/w/api.php?'
                'action=query&list=search&format=json&srprop=snippet&'
                'srsearch={}'.format(encoded_body))
            doc = json.loads(response.content)

            results = doc.get('query', {}).get('search')
            if not results:
                msg.reply('I wasn\'t able to locate info on "{}" Sorry'.format(
                    msg_body)).send()
                return

            snippet = results[0]['snippet']
            title = urllib.quote_plus(results[0]['title'])

            # Strip out html
            snippet = html_parser.HTMLParser().unescape(
                re.sub(r'<[^>]*>', '', snippet))
            msg.reply(u'{}...\n(http://en.wikipedia.org/w/?title={})'.format(
                snippet, title)).send()

開發者ID:GoogleCloudPlatform，項目名稱:python-docs-samples，代碼行數:39，代碼來源:wikibot.py

示例4: _string_data

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def _string_data(data, data_type):
        """Replace various objects types with string representations."""
        if data_type == 'json':
            return json.dumps(data)
        elif data_type == 'xml':
            if isinstance(data, str):
                return data
            str_data = ElementTree.tostring(data)
            # No way to stop tostring from HTML escaping even if we wanted
            h = html_parser.HTMLParser()
            return h.unescape(str_data.decode())
        elif data_type == 'yaml':
            return yaml.dump(data)
        else:
            return data

開發者ID:openstack-archive，項目名稱:syntribos，代碼行數:17，代碼來源:parser.py

示例5: init

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def __init__(self):
        html_parser.HTMLParser.__init__(self)
        self._in_td = False
        self.data = list()

開發者ID:grundic，項目名稱:yagocd，代碼行數:6，代碼來源:info.py

示例6: get_saml_token

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def get_saml_token(session, username, password, saml_cfg_id):
    """
    Log into LastPass and retrieve a SAML token for a given
    SAML configuration.
    """
    logger.debug("Getting SAML token")

    # now logged in, grab the SAML token from the IdP-initiated login
    idp_login = '%s/saml/launch/cfg/%d' % (LASTPASS_SERVER, saml_cfg_id)

    r = session.get(idp_login, verify=should_verify())

    form = extract_form(r.text)
    if not form['action']:
        # try to scrape the error message just to make it more user friendly
        error = ""
        for l in r.text.splitlines():
            match = re.search(r'<h2>(.*)</h2>', l)
            if match:
                msg = html_parser.HTMLParser().unescape(match.group(1))
                msg = msg.replace("<br/>", "\n")
                msg = msg.replace("<b>", "")
                msg = msg.replace("</b>", "")
                error = "\n" + msg

        raise ValueError("Unable to find SAML ACS" + error)

    return b64decode(form['fields']['SAMLResponse'])

開發者ID:lastpass，項目名稱:lp-aws-saml，代碼行數:30，代碼來源:lp-aws-saml.py

示例7: strip_html

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def strip_html(html):
    class MLStripper(HTMLParser):
        def __init__(self):
            self.reset()
            self.strict = False
            self.fed = []
        def handle_data(self, d):
            self.fed.append(d)
        def get_data(self):
            return ''.join(self.fed)
    p = MLStripper()
    p.feed(html)
    return p.get_data()

開發者ID:rmcgibbo，項目名稱:figshare，代碼行數:15，代碼來源:utils.py

示例8: _get_field

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def _get_field(self, field, default=''):
        val = self.params.get(field, [default])
        val = val[0] if isinstance(val, list) else val
        return HTMLParser().unescape(val)

開發者ID:pinterest，項目名稱:git-stacktrace，代碼行數:6，代碼來源:server.py

示例9: _highlight

# 需要導入模塊: from six.moves import html_parser [as 別名]
# 或者: from six.moves.html_parser import HTMLParser [as 別名]
def _highlight(html):
    """Syntax-highlights HTML-rendered Markdown.

    Plucks sections to highlight that conform the the GitHub fenced code info
    string as defined at https://github.github.com/gfm/#info-string.

    Args:
        html (str): The rendered HTML.

    Returns:
        str: The HTML with Pygments syntax highlighting applied to all code
            blocks.
    """

    formatter = pygments.formatters.HtmlFormatter(nowrap=True)

    code_expr = re.compile(
        r'<pre><code class="language-(?P<lang>.+?)">(?P<code>.+?)'
        r'</code></pre>', re.DOTALL)

    def replacer(match):
        try:
            lang = match.group('lang')
            lang = _LANG_ALIASES.get(lang, lang)
            lexer = pygments.lexers.get_lexer_by_name(lang)
        except ValueError:
            lexer = pygments.lexers.TextLexer()

        code = match.group('code')

        # Decode html entities in the code. cmark tries to be helpful and
        # translate '"' to '&quot;', but it confuses pygments. Pygments will
        # escape any html entities when re-writing the code, and we run
        # everything through bleach after.
        code = html_parser.HTMLParser().unescape(code)

        highlighted = pygments.highlight(code, lexer, formatter)

        return '<pre>{}</pre>'.format(highlighted)

    result = code_expr.sub(replacer, html)

    return result

開發者ID:pypa，項目名稱:readme_renderer，代碼行數:45，代碼來源:markdown.py

注：本文中的six.moves.html_parser.HTMLParser方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台，相關代碼片段篩選自各路編程大神貢獻的開源項目，源碼版權歸原作者所有，傳播和使用請參考對應項目的License；未經允許，請勿轉載。