當前位置: 首頁>>代碼示例>>Java>>正文


Java Parser.parseInput方法代碼示例

本文整理匯總了Java中org.jsoup.parser.Parser.parseInput方法的典型用法代碼示例。如果您正苦於以下問題:Java Parser.parseInput方法的具體用法?Java Parser.parseInput怎麽用?Java Parser.parseInput使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在org.jsoup.parser.Parser的用法示例。


在下文中一共展示了Parser.parseInput方法的8個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。

示例1: parseHtmlTemplate

import org.jsoup.parser.Parser; //導入方法依賴的package包/類
/**
 * Parse a given HTML template and return the a result object containing the expressions
 * and a transformed HTML.
 * @param htmlTemplate The HTML template to process, as a String
 * @param context Context of the Component we are currently processing
 * @return A {@link TemplateParserResult} containing the processed template and expressions
 */
public TemplateParserResult parseHtmlTemplate(String htmlTemplate,
    TemplateParserContext context)
{
    this.context = context;
    Parser parser = Parser.htmlParser();
    parser.settings(new ParseSettings(true, true)); // tag, attribute preserve case
    Document doc = parser.parseInput(htmlTemplate, "");

    result = new TemplateParserResult();
    processImports(doc);
    processNode(doc);

    result.setProcessedTemplate(doc.body().html());
    return result;
}
 
開發者ID:Axellience,項目名稱:vue-gwt,代碼行數:23,代碼來源:TemplateParser.java

示例2: parseByteData

import org.jsoup.parser.Parser; //導入方法依賴的package包/類
static Document parseByteData(ByteBuffer byteData, String charsetName, String baseUri, Parser parser) {
    String docData;
    Document doc = null;

    // look for BOM - overrides any other header or input
    charsetName = detectCharsetFromBom(byteData, charsetName);

    if (charsetName == null) { // determine from meta. safe first parse as UTF-8
        // look for <meta http-equiv="Content-Type" content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
        docData = Charset.forName(defaultCharset).decode(byteData).toString();
        doc = parser.parseInput(docData, baseUri);
        Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
        String foundCharset = null; // if not found, will keep utf-8 as best attempt
        if (meta != null) {
            if (meta.hasAttr("http-equiv")) {
                foundCharset = getCharsetFromContentType(meta.attr("content"));
            }
            if (foundCharset == null && meta.hasAttr("charset")) {
                foundCharset = meta.attr("charset");
            }
        }
        // look for <?xml encoding='ISO-8859-1'?>
        if (foundCharset == null && doc.childNode(0) instanceof XmlDeclaration) {
            XmlDeclaration prolog = (XmlDeclaration) doc.childNode(0);
            if (prolog.name().equals("xml")) {
                foundCharset = prolog.attr("encoding");
            }
        }
        foundCharset = validateCharset(foundCharset);

        if (foundCharset != null && !foundCharset.equals(defaultCharset)) { // need to re-decode
            foundCharset = foundCharset.trim().replaceAll("[\"']", "");
            charsetName = foundCharset;
            byteData.rewind();
            docData = Charset.forName(foundCharset).decode(byteData).toString();
            doc = null;
        }
    } else { // specified by content type header (or by user on file load)
        Validate.notEmpty(charsetName, "Must set charset arg to character set of file to parse. Set to null to attempt to detect from HTML");
        docData = Charset.forName(charsetName).decode(byteData).toString();
    }
    if (doc == null) {
        doc = parser.parseInput(docData, baseUri);
        doc.outputSettings().charset(charsetName);
    }
    return doc;
}
 
開發者ID:cpusoft,項目名稱:common,代碼行數:48,代碼來源:DataUtil.java

示例3: parseByteData

import org.jsoup.parser.Parser; //導入方法依賴的package包/類
static Document parseByteData(ByteBuffer byteData, String charsetName, String baseUri, Parser parser) {
    String docData;
    Document doc = null;
    if (charsetName == null) { // determine from meta. safe parse as UTF-8
        // look for <meta http-equiv="Content-Type" content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
        docData = Charset.forName(defaultCharset).decode(byteData).toString();
        doc = parser.parseInput(docData, baseUri);
        Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
        if (meta != null) { // if not found, will keep utf-8 as best attempt
            String foundCharset = null;
            if (meta.hasAttr("http-equiv")) {
                foundCharset = getCharsetFromContentType(meta.attr("content"));
            }
            if (foundCharset == null && meta.hasAttr("charset")) {
                try {
                    if (Charset.isSupported(meta.attr("charset"))) {
                        foundCharset = meta.attr("charset");
                    }
                } catch (IllegalCharsetNameException e) {
                    foundCharset = null;
                }
            }

            if (foundCharset != null && foundCharset.length() != 0 && !foundCharset.equals(defaultCharset)) { // need to re-decode
                foundCharset = foundCharset.trim().replaceAll("[\"']", "");
                charsetName = foundCharset;
                byteData.rewind();
                docData = Charset.forName(foundCharset).decode(byteData).toString();
                doc = null;
            }
        }
    } else { // specified by content type header (or by user on file load)
        Validate.notEmpty(charsetName, "Must set charset arg to character set of file to parse. Set to null to attempt to detect from HTML");
        docData = Charset.forName(charsetName).decode(byteData).toString();
    }
    // UTF-8 BOM indicator. takes precedence over everything else. rarely used. re-decodes incase above decoded incorrectly
    if (docData.length() > 0 && docData.charAt(0) == UNICODE_BOM) {
        byteData.rewind();
        docData = Charset.forName(defaultCharset).decode(byteData).toString();
        docData = docData.substring(1);
        charsetName = defaultCharset;
        doc = null;
    }
    if (doc == null) {
        doc = parser.parseInput(docData, baseUri);
        doc.outputSettings().charset(charsetName);
    }
    return doc;
}
 
開發者ID:rogerxaic,項目名稱:gestock,代碼行數:50,代碼來源:DataUtil.java

示例4: parseByteData

import org.jsoup.parser.Parser; //導入方法依賴的package包/類
static Document parseByteData(ByteBuffer byteData, String charsetName, String baseUri, Parser parser) {
    String docData;
    Document doc = null;
    if (charsetName == null) { // determine from meta. safe parse as UTF-8
        // look for <meta http-equiv="Content-Type" content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
        docData = Charset.forName(defaultCharset).decode(byteData).toString();
        doc = parser.parseInput(docData, baseUri);
        Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
        if (meta != null) { // if not found, will keep utf-8 as best attempt
            String foundCharset;
            if (meta.hasAttr("http-equiv")) {
                foundCharset = getCharsetFromContentType(meta.attr("content"));
                if (foundCharset == null && meta.hasAttr("charset")) {
                    try {
                        if (Charset.isSupported(meta.attr("charset"))) {
                            foundCharset = meta.attr("charset");
                        }
                    } catch (IllegalCharsetNameException e) {
                        foundCharset = null;
                    }
                }
            } else {
                foundCharset = meta.attr("charset");
            }

            if (foundCharset != null && foundCharset.length() != 0 && !foundCharset.equals(defaultCharset)) { // need to re-decode
                foundCharset = foundCharset.trim().replaceAll("[\"']", "");
                charsetName = foundCharset;
                byteData.rewind();
                docData = Charset.forName(foundCharset).decode(byteData).toString();
                doc = null;
            }
        }
    } else { // specified by content type header (or by user on file load)
        Validate.notEmpty(charsetName, "Must set charset arg to character set of file to parse. Set to null to attempt to detect from HTML");
        docData = Charset.forName(charsetName).decode(byteData).toString();
    }
    // UTF-8 BOM indicator. takes precedence over everything else. rarely used. re-decodes incase above decoded incorrectly
    if (docData.length() > 0 && docData.charAt(0) == 65279) {
        byteData.rewind();
        docData = Charset.forName(defaultCharset).decode(byteData).toString();
        docData = docData.substring(1);
        charsetName = defaultCharset;
        doc = null;
    }
    if (doc == null) {
        doc = parser.parseInput(docData, baseUri);
        doc.outputSettings().charset(charsetName);
    }
    return doc;
}
 
開發者ID:shannah,項目名稱:CN1ML-NetbeansModule,代碼行數:52,代碼來源:DataUtil.java

示例5: parseByteData

import org.jsoup.parser.Parser; //導入方法依賴的package包/類
static Document parseByteData(ByteBuffer byteData, String charsetName, String baseUri, Parser parser) {
    String docData;
    Document doc = null;

    // look for BOM - overrides any other header or input
    byteData.mark();
    byte[] bom = new byte[4];
    if (byteData.remaining() >= bom.length) {
        byteData.get(bom);
        byteData.rewind();
    }
    if (bom[0] == 0x00 && bom[1] == 0x00 && bom[2] == (byte) 0xFE && bom[3] == (byte) 0xFF || // BE
            bom[0] == (byte) 0xFF && bom[1] == (byte) 0xFE && bom[2] == 0x00 && bom[3] == 0x00) { // LE
        charsetName = "UTF-32"; // and I hope it's on your system
    } else if (bom[0] == (byte) 0xFE && bom[1] == (byte) 0xFF || // BE
            bom[0] == (byte) 0xFF && bom[1] == (byte) 0xFE) {
        charsetName = "UTF-16"; // in all Javas
    } else if (bom[0] == (byte) 0xEF && bom[1] == (byte) 0xBB && bom[2] == (byte) 0xBF) {
        charsetName = "UTF-8"; // in all Javas
        byteData.position(3); // 16 and 32 decoders consume the BOM to determine be/le; utf-8 should be consumed
    }

    if (charsetName == null) { // determine from meta. safe parse as UTF-8
        // look for <meta http-equiv="Content-Type" content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
        docData = Charset.forName(defaultCharset).decode(byteData).toString();
        doc = parser.parseInput(docData, baseUri);
        Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
        if (meta != null) { // if not found, will keep utf-8 as best attempt
            String foundCharset = null;
            if (meta.hasAttr("http-equiv")) {
                foundCharset = getCharsetFromContentType(meta.attr("content"));
            }
            if (foundCharset == null && meta.hasAttr("charset")) {
                try {
                    if (Charset.isSupported(meta.attr("charset"))) {
                        foundCharset = meta.attr("charset");
                    }
                } catch (IllegalCharsetNameException e) {
                    foundCharset = null;
                }
            }

            if (foundCharset != null && foundCharset.length() != 0 && !foundCharset.equals(defaultCharset)) { // need to re-decode
                foundCharset = foundCharset.trim().replaceAll("[\"']", "");
                charsetName = foundCharset;
                byteData.rewind();
                docData = Charset.forName(foundCharset).decode(byteData).toString();
                doc = null;
            }
        }
    } else { // specified by content type header (or by user on file load)
        Validate.notEmpty(charsetName, "Must set charset arg to character set of file to parse. Set to null to attempt to detect from HTML");
        docData = Charset.forName(charsetName).decode(byteData).toString();
    }
    if (doc == null) {
        doc = parser.parseInput(docData, baseUri);
        doc.outputSettings().charset(charsetName);
    }
    return doc;
}
 
開發者ID:SpoonLabs,項目名稱:astor,代碼行數:61,代碼來源:DataUtil.java

示例6: parseInputStream

import org.jsoup.parser.Parser; //導入方法依賴的package包/類
static Document parseInputStream(InputStream input, String charsetName, String baseUri, Parser parser) throws IOException  {
    if (input == null) // empty body
        return new Document(baseUri);

    if (!(input instanceof ConstrainableInputStream))
        input = new ConstrainableInputStream(input, bufferSize, 0);

    Document doc = null;
    boolean fullyRead = false;

    // read the start of the stream and look for a BOM or meta charset
    input.mark(firstReadBufferSize);
    ByteBuffer firstBytes = readToByteBuffer(input, firstReadBufferSize - 1); // -1 because we read one more to see if completed
    fullyRead = input.read() == -1;
    input.reset();

    // look for BOM - overrides any other header or input
    BomCharset bomCharset = detectCharsetFromBom(firstBytes, charsetName);
    if (bomCharset != null) {
        charsetName = bomCharset.charset;
        input.skip(bomCharset.offset);
    }

    if (charsetName == null) { // determine from meta. safe first parse as UTF-8
        String docData = Charset.forName(defaultCharset).decode(firstBytes).toString();
        doc = parser.parseInput(docData, baseUri);

        // look for <meta http-equiv="Content-Type" content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
        Elements metaElements = doc.select("meta[http-equiv=content-type], meta[charset]");
        String foundCharset = null; // if not found, will keep utf-8 as best attempt
        for (Element meta : metaElements) {
            if (meta.hasAttr("http-equiv"))
                foundCharset = getCharsetFromContentType(meta.attr("content"));
            if (foundCharset == null && meta.hasAttr("charset"))
                foundCharset = meta.attr("charset");
            if (foundCharset != null)
                break;
        }

        // look for <?xml encoding='ISO-8859-1'?>
        if (foundCharset == null && doc.childNodeSize() > 0 && doc.childNode(0) instanceof XmlDeclaration) {
            XmlDeclaration prolog = (XmlDeclaration) doc.childNode(0);
            if (prolog.name().equals("xml"))
                foundCharset = prolog.attr("encoding");
        }
        foundCharset = validateCharset(foundCharset);
        if (foundCharset != null && !foundCharset.equalsIgnoreCase(defaultCharset)) { // need to re-decode. (case insensitive check here to match how validate works)
            foundCharset = foundCharset.trim().replaceAll("[\"']", "");
            charsetName = foundCharset;
            doc = null;
        } else if (!fullyRead) {
            doc = null;
        }
    } else { // specified by content type header (or by user on file load)
        Validate.notEmpty(charsetName, "Must set charset arg to character set of file to parse. Set to null to attempt to detect from HTML");
    }
    if (doc == null) {
        if (charsetName == null)
            charsetName = defaultCharset;
        BufferedReader reader = new BufferedReader(new InputStreamReader(input, charsetName), bufferSize);
        doc = parser.parseInput(reader, baseUri);
        doc.outputSettings().charset(charsetName);
    }
    input.close();
    return doc;
}
 
開發者ID:SpoonLabs,項目名稱:astor,代碼行數:67,代碼來源:DataUtil.java

示例7: parseByteData

import org.jsoup.parser.Parser; //導入方法依賴的package包/類
static Document parseByteData(ByteBuffer byteData, String charsetName, String baseUri, Parser parser) {
    String docData;
    Document doc = null;
    if (charsetName == null) { // determine from meta. safe parse as UTF-8
        // look for <meta http-equiv="Content-Type" content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
        docData = Charset.forName(defaultCharset).decode(byteData).toString();
        doc = parser.parseInput(docData, baseUri);
        Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
        if (meta != null) { // if not found, will keep utf-8 as best attempt

            String foundCharset;
            if (meta.hasAttr("http-equiv")) {
                foundCharset = getCharsetFromContentType(meta.attr("content"));
                if (foundCharset == null && meta.hasAttr("charset")) {
                    try {
                        if (Charset.isSupported(meta.attr("charset"))) {
                            foundCharset = meta.attr("charset");
                        }
                    } catch (IllegalCharsetNameException e) {
                        foundCharset = null;
                    }
                }
            } else {
                foundCharset = meta.attr("charset");
            }

            if (foundCharset != null && foundCharset.length() != 0 && !foundCharset.equals(defaultCharset)) { // need to re-decode
                foundCharset = foundCharset.trim().replaceAll("[\"']", "");
                charsetName = foundCharset;
                byteData.rewind();
                docData = Charset.forName(foundCharset).decode(byteData).toString();
                doc = null;
            }
        }
    } else { // specified by content type header (or by user on file load)
        Validate.notEmpty(charsetName, "Must set charset arg to character set of file to parse. Set to null to attempt to detect from HTML");
        docData = Charset.forName(charsetName).decode(byteData).toString();
    }
    if (doc == null) {
        // there are times where there is a spurious byte-order-mark at the start of the text. Shouldn't be present
        // in utf-8. If after decoding, there is a BOM, strip it; otherwise will cause the parser to go straight
        // into head mode
        if (docData.length() > 0 && docData.charAt(0) == 65279)
            docData = docData.substring(1);

        doc = parser.parseInput(docData, baseUri);
        doc.outputSettings().charset(charsetName);
    }
    return doc;
}
 
開發者ID:Nader-Sl,項目名稱:BoL-API-Parser,代碼行數:51,代碼來源:DataUtil.java

示例8: parse

import org.jsoup.parser.Parser; //導入方法依賴的package包/類
/**
 Parse HTML into a Document, using the provided Parser. You can provide an alternate parser, such as a simple XML
 (non-HTML) parser.

 @param html    HTML to parse
 @param baseUri The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
 before the HTML declares a {@code <base href>} tag.
 @param parser alternate {@link Parser#xmlParser() parser} to use.
 @return sane HTML
 */
public static Document parse(String html, String baseUri, Parser parser) {
    return parser.parseInput(html, baseUri);
}
 
開發者ID:cpusoft,項目名稱:common,代碼行數:14,代碼來源:Jsoup.java


注:本文中的org.jsoup.parser.Parser.parseInput方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。