当前位置: 首页>>代码示例>>Java>>正文


Java DOMFragmentParser类代码示例

本文整理汇总了Java中org.cyberneko.html.parsers.DOMFragmentParser的典型用法代码示例。如果您正苦于以下问题:Java DOMFragmentParser类的具体用法?Java DOMFragmentParser怎么用?Java DOMFragmentParser使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。


DOMFragmentParser类属于org.cyberneko.html.parsers包,在下文中一共展示了DOMFragmentParser类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: htmlToText

import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
/**
 * @param html
 * @return text from HTML
 */
public static String htmlToText(String html) {

	DOMFragmentParser parser = new DOMFragmentParser();
	StringBuffer buffer = new StringBuffer();
	
	try {
		ByteArrayInputStream fin = new ByteArrayInputStream(html.getBytes("UTF-8"));
		InputSource inSource = new InputSource(fin);
		CoreDocumentImpl codeDoc = new CoreDocumentImpl();
		DocumentFragment doc = codeDoc.createDocumentFragment();
		parser.parse(inSource, doc);
		processNode(buffer, doc);
		fin.close();
	} catch (Exception e) {
		return null;
	}

	return buffer.toString();
}
 
开发者ID:MobileManAG,项目名称:Project-H-Backend,代码行数:24,代码来源:HTMLTextParser.java

示例2: stringToNode

import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
protected Node stringToNode(String str) {
	try {
		final DOMFragmentParser parser = new DOMFragmentParser();
		final DocumentFragment fragment = document.createDocumentFragment();
		parser.parse(new InputSource(new StringReader(str)), fragment);
		return fragment;

		// try and return the element itself if possible...
		// NodeList nl = fragment.getChildNodes();
		// for (int i=0; i<nl.getLength(); i++) if (nl.item(i).getNodeType()
		// == Node.ELEMENT_NODE) return nl.item(i);
		// return fragment;

	} catch (final Exception e) {
		throw new RuntimeException(e);
	}
}
 
开发者ID:openimaj,项目名称:openimaj,代码行数:18,代码来源:Readability.java

示例3: htmlToText

import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
/**
 * 将 html 格式的文本过滤掉标签.
 * @param html
 *            html 格式的字符串
 * @return String
 * 			  过滤掉 html 标签后的文本。如果 html 为空,返回空串""
 */
private String htmlToText(String html) {
	if (html == null) {
		return "";
	}
	DOMFragmentParser parser = new DOMFragmentParser();
	CoreDocumentImpl codeDoc = new CoreDocumentImpl();
	InputSource inSource = new InputSource(new ByteArrayInputStream(html.getBytes()));
	inSource.setEncoding(textCharset);
	DocumentFragment doc = codeDoc.createDocumentFragment();

	try {
		parser.parse(inSource, doc);
	} catch (Exception e) {
		return "";
	}

	textBuffer = new StringBuffer();
	processNode(doc);
	return textBuffer.toString();
}
 
开发者ID:heartsome,项目名称:translationstudio8,代码行数:28,代码来源:MessageParser.java

示例4: parse

import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
public static Node parse(String content) throws SAXException, IOException {
  DOMFragmentParser parser = new DOMFragmentParser();
  HTMLDocument document = new HTMLDocumentImpl();
  DocumentFragment fragment = document.createDocumentFragment();

  InputSource is = new InputSource(new StringReader(content));
  parser.parse(is, fragment);
  return fragment;
}
 
开发者ID:bsspirit,项目名称:kettle-4.4.0-stable,代码行数:10,代码来源:CarteTest.java

示例5: parse

import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
public static Node parse( String content ) throws SAXException, IOException {
  DOMFragmentParser parser = new DOMFragmentParser();
  HTMLDocument document = new HTMLDocumentImpl();
  DocumentFragment fragment = document.createDocumentFragment();

  InputSource is = new InputSource( new StringReader( content ) );
  parser.parse( is, fragment );
  return fragment;
}
 
开发者ID:pentaho,项目名称:pentaho-kettle,代码行数:10,代码来源:CarteIT.java

示例6: setup

import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
@Before
public void setup() throws Exception {
  conf = NutchConfiguration.create();
  conf.setBoolean("parser.html.form.use_action", true);
  utils = new DOMContentUtils(conf);
  DOMFragmentParser parser = new DOMFragmentParser();
  parser.setFeature(
      "http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe",
      true);
  for (int i = 0; i < testPages.length; i++) {
    DocumentFragment node = new HTMLDocumentImpl().createDocumentFragment();
    try {
      parser.parse(
          new InputSource(new ByteArrayInputStream(testPages[i].getBytes())),
          node);
      testBaseHrefURLs[i] = new URL(testBaseHrefs[i]);
    } catch (Exception e) {
      Assert.assertTrue("caught exception: " + e, false);
    }
    testDOMs[i] = node;
  }
  answerOutlinks = new Outlink[][] {
      { new Outlink("http://www.nutch.org", "anchor"), },
      { new Outlink("http://www.nutch.org/", "home"),
          new Outlink("http://www.nutch.org/docs/bot.html", "bots"), },
      { new Outlink("http://www.nutch.org/", "separate this"),
          new Outlink("http://www.nutch.org/docs/ok", "from this"), },
      { new Outlink("http://www.nutch.org/", "home"),
          new Outlink("http://www.nutch.org/docs/1", "1"),
          new Outlink("http://www.nutch.org/docs/2", "2"), },
      { new Outlink("http://www.nutch.org/frames/top.html", ""),
          new Outlink("http://www.nutch.org/frames/left.html", ""),
          new Outlink("http://www.nutch.org/frames/invalid.html", ""),
          new Outlink("http://www.nutch.org/frames/right.html", ""), },
      { new Outlink("http://www.nutch.org/maps/logo.gif", ""),
          new Outlink("http://www.nutch.org/index.html", ""),
          new Outlink("http://www.nutch.org/maps/#bottom", ""),
          new Outlink("http://www.nutch.org/bot.html", ""),
          new Outlink("http://www.nutch.org/docs/index.html", ""), },
      { new Outlink("http://www.nutch.org/index.html", "whitespace test"), },
      {},
      { new Outlink("http://www.nutch.org/dummy.jsp", "test2"), },
      {},
      { new Outlink("http://www.nutch.org/;x", "anchor1"),
          new Outlink("http://www.nutch.org/g;x", "anchor2"),
          new Outlink("http://www.nutch.org/g;x?y#s", "anchor3") },
      {
          // this is tricky - see RFC3986 section 5.4.1 example 7
          new Outlink("http://www.nutch.org/g", "anchor1"),
          new Outlink("http://www.nutch.org/g?y#s", "anchor2"),
          new Outlink("http://www.nutch.org/;something?y=1", "anchor3"),
          new Outlink("http://www.nutch.org/;something?y=1#s", "anchor4"),
          new Outlink("http://www.nutch.org/;something?y=1;somethingelse",
              "anchor5") } };

}
 
开发者ID:jorcox,项目名称:GeoCrawler,代码行数:57,代码来源:TestDOMContentUtils.java


注:本文中的org.cyberneko.html.parsers.DOMFragmentParser类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。