Python HTMLParser.feed方法代碼示例

本文整理匯總了Python中html.parser.HTMLParser.feed方法的典型用法代碼示例。如果您正苦於以下問題：Python HTMLParser.feed方法的具體用法？Python HTMLParser.feed怎麽用？Python HTMLParser.feed使用的例子？那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類html.parser.HTMLParser的用法示例。

在下文中一共展示了HTMLParser.feed方法的15個代碼示例，這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚，您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: getFormattedHTML

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def getFormattedHTML(self, indent='  '):
        '''
            getFormattedHTML - Get formatted and xhtml of this document, replacing the original whitespace
                with a pretty-printed version

            @param indent - space/tab/newline of each level of indent, or integer for how many spaces per level

            @return - <str> Formatted html

            @see getHTML - Get HTML with original whitespace

            @see getMiniHTML - Get HTML with only functional whitespace remaining
        '''
        from .Formatter import AdvancedHTMLFormatter
        html = self.getHTML()
        formatter = AdvancedHTMLFormatter(indent, None) # Do not double-encode
        formatter.feed(html)
        return formatter.getHTML()

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:20，代碼來源:Parser.py

示例2: getHTML

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def getHTML(self):
        '''
            getHTML - Get the full HTML as contained within this tree, converted to  valid XHTML
                @returns - String
        '''
        root = self.getRoot()
        if root is None:
            raise ValueError('Cannot format, use feed to load contents.')

        if self.doctype:
            doctypeStr = '<!%s>\n' %(self.doctype)
        else:
            doctypeStr = ''

        # 6.6.0: If we have a real root tag, print the outerHTML. If we have a fake root tag (for multiple root condition),
        #   then print the innerHTML (skipping the outer root tag). Otherwise, we will miss
        #   untagged text (between the multiple root nodes).
        rootNode = self.getRoot()
        if rootNode.tagName == INVISIBLE_ROOT_TAG:
            return doctypeStr + rootNode.innerHTML
        else:
            return doctypeStr + rootNode.outerHTML
#        return doctypeStr + ''.join([elem.outerHTML for elem in self.getRootNodes()])

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:25，代碼來源:Formatter.py

示例3: remove

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def remove(self, item):
        """
        This is as list.remove but works with id.

        data = '<a><b></b><b></b></a>'
        html = Html()
        dom = html.feed(data)
        
        for root, ind in dom.sail_with_root():
            if ind.name == 'b':
                root.remove(ind)
        
        print dom
        
        It should print.

        <a ></a>
        """

        index = self.index(item)
        del self[index]

開發者ID:iogf，項目名稱:ehp，代碼行數:23，代碼來源:ehp.py

示例4: take

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def take(self, *args):
        """
        It returns the first object whose one of its
        attributes matches (key0, value0), (key1, value1), ... .

        Example:

        data = '<a><b id="foo" size="1"></b></a>'
        html = Html()
        dom = html.feed(data)
        
        print dom.take(('id', 'foo'))
        print dom.take(('id', 'foo'), ('size', '2'))
        """

        seq = self.match(*args)
        
        try:
            item = next(seq)
        except StopIteration:
            return None
        else:
            return item

開發者ID:iogf，項目名稱:ehp，代碼行數:25，代碼來源:ehp.py

示例5: walk_with_root

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def walk_with_root(self):
        """
        Like walk but carries root.

        Example:

        html = Html()
        data = '<body><em>alpha</em></body>'
        dom = html.feed(data)
        
        for (root, name, attr), (ind, name, attr) in dom.walk_with_root():
            print root, name, ind, name

        Output:

        <em >alpha</em> 1 alpha 1
        <body ><em >alpha</em></body> em <em >alpha</em> em
        <body ><em >alpha</em></body> body <body ><em >alpha</em></body> body    
        """

        for root, ind in self.sail_with_root():
            yield ((root, root.name, root.attr), 
                   (ind, ind.name, ind.attr))

開發者ID:iogf，項目名稱:ehp，代碼行數:25，代碼來源:ehp.py

示例6: init

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def __init__(self, data):
        """
        The data holds the characters.

        Example:

        html = Html()
        data = '<body><em>alpha</em></body>'
        dom = html.feed(data)
        x = dom.fst('em')
        x.append(Data('\nbeta'))

        It outputs.

        <body ><em >alpha
        beta</em></body>
        """

        Root.__init__(self, DATA)
        self.data = data

開發者ID:iogf，項目名稱:ehp，代碼行數:22，代碼來源:ehp.py

示例7: getMiniHTML

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def getMiniHTML(self):
        '''
            getMiniHTML - Gets the HTML representation of this document without any pretty formatting
                and disregarding original whitespace beyond the functional.

                @return <str> - HTML with only functional whitespace present
        '''
        from .Formatter import AdvancedHTMLMiniFormatter
        html = self.getHTML()
        formatter = AdvancedHTMLMiniFormatter(None) # Do not double-encode
        formatter.feed(html)
        return formatter.getHTML()

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:14，代碼來源:Parser.py

示例8: feed

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def feed(self, contents):
        '''
            feed - Feed contents. Use  parseStr or parseFile instead.

            @param contents - Contents
        '''
        contents = stripIEConditionals(contents)
        try:
            HTMLParser.feed(self, contents)
        except MultipleRootNodeException:
            self.reset()
            HTMLParser.feed(self, "%s%s" %(addStartTag(contents, INVISIBLE_ROOT_TAG_START), INVISIBLE_ROOT_TAG_END))

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:14，代碼來源:Parser.py

示例9: parseFile

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def parseFile(self, filename):
        '''
            parseFile - Parses a file and creates the DOM tree and indexes

                @param filename <str/file> - A string to a filename or a file object. If file object, it will not be closed, you must close.
        '''
        self.reset()

        if isinstance(filename, file):
            contents = filename.read()
        else:
            with codecs.open(filename, 'r', encoding=self.encoding) as f:
                contents = f.read()

        self.feed(contents)

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:17，代碼來源:Parser.py

示例10: parseStr

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def parseStr(self, html):
        '''
            parseStr - Parses a string and creates the DOM tree and indexes.

                @param html <str> - valid HTML
        '''
        self.reset()

        if isinstance(html, bytes):
            self.feed(html.decode(self.encoding))
        else:
            self.feed(html)

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:14，代碼來源:Parser.py

示例11: feed

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def feed(self, contents):
        '''
            feed - Load contents

            @param contents - HTML contents
        '''
        contents = stripIEConditionals(contents)
        try:
            HTMLParser.feed(self, contents)
        except MultipleRootNodeException:
            self.reset()

            HTMLParser.feed(self, "%s%s" %(addStartTag(contents, INVISIBLE_ROOT_TAG_START), INVISIBLE_ROOT_TAG_END))

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:15，代碼來源:Formatter.py

示例12: parseFile

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def parseFile(self, filename):
        '''
            parseFile - Parses a file and creates the DOM tree and indexes

                @param filename <str/file> - A string to a filename or a file object. If file object, it will not be closed, you must close.
        '''
        self.reset()

        if isinstance(filename, file):
            contents = filename.read()
        else:
            with codecs.open(filename, 'r', encoding=self.encoding) as f:
                contents = f.read()
        self.feed(contents)

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:16，代碼來源:Formatter.py

示例13: parseStr

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def parseStr(self, html):
        '''
            parseStr - Parses a string and creates the DOM tree and indexes.

                @param html <str> - valid HTML
        '''
        self.reset()
        if isinstance(html, bytes):
            self.feed(html.decode(self.encoding))
        else:
            self.feed(html)

開發者ID:kata198，項目名稱:AdvancedHTMLParser，代碼行數:13，代碼來源:Formatter.py

示例14: sail

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def sail(self):
        """ 
        This is used to navigate through the xml/html document.
        Every xml/html object is represented by a python class
        instance that inherits from Root.
        
        The method sail is used to return an iterator
        for these objects.

        Example:
        data = '<a> <b> </b> </a>'

        html = Html()
        dom = html.feed(data)

        for ind in dom.sail():
            print type(ind),',', ind.name

        It would output.

        <class 'ehp.Root'> , a
        <class 'ehp.Root'> , b
        """
           
        for indi in self[:]:
            for indj in indi.sail():
                yield(indj)

            yield(indi)

開發者ID:iogf，項目名稱:ehp，代碼行數:31，代碼來源:ehp.py

示例15: index

# 需要導入模塊: from html.parser import HTMLParser [as 別名]
# 或者: from html.parser.HTMLParser import feed [as 別名]
def index(self, item):
        """
        This is similar to index but uses id
        to check for equality.

        Example:

        data = '<a><b></b><b></b></a>'
        html = Html()
        dom = html.feed(data)
        
        for root, ind in dom.sail_with_root():
            print root.name, ind.name, root.index(ind)


        It would print.

        a b 0
        a b 1
         a 0        

        The line where it appears ' a 0' corresponds to the
        outmost object. The outmost object is an instance of Root
        that contains all the other objects.
        """

        count = 0
        for ind in self:
            if ind is item: return count
            count = count + 1

        raise ValueError

開發者ID:iogf，項目名稱:ehp，代碼行數:34，代碼來源:ehp.py

注：本文中的html.parser.HTMLParser.feed方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台，相關代碼片段篩選自各路編程大神貢獻的開源項目，源碼版權歸原作者所有，傳播和使用請參考對應項目的License；未經允許，請勿轉載。

示例1: getFormattedHTML

示例2: getHTML

示例3: remove

示例4: take

示例5: walk_with_root

示例6: __init__

示例7: getMiniHTML

示例8: feed

示例9: parseFile

示例10: parseStr

示例11: feed

示例12: parseFile

示例13: parseStr

示例14: sail

示例15: index

示例6: init