当前位置: 首页>>代码示例>>Java>>正文


Java HTMLStripCharFilter类代码示例

本文整理汇总了Java中org.apache.lucene.analysis.charfilter.HTMLStripCharFilter的典型用法代码示例。如果您正苦于以下问题:Java HTMLStripCharFilter类的具体用法?Java HTMLStripCharFilter怎么用?Java HTMLStripCharFilter使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。


HTMLStripCharFilter类属于org.apache.lucene.analysis.charfilter包,在下文中一共展示了HTMLStripCharFilter类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: filter

import org.apache.lucene.analysis.charfilter.HTMLStripCharFilter; //导入依赖的package包/类
private String filter(String value) {
	StringBuilder out = new StringBuilder();
	StringReader strReader = new StringReader(value);
	try {
		HTMLStripCharFilter html = new HTMLStripCharFilter(new BufferedReader(strReader));
		char[] cbuf = new char[1024 * 10];
		while (true) {
			int count = html.read(cbuf);
			if (count == -1)
				break; // end of stream mark is -1
			if (count > 0)
				out.append(cbuf, 0, count);
		}
		html.close();
	} catch (IOException e) {
		throw new RuntimeException("Failed stripping HTML for value: "
				+ value, e);
	}
	return out.toString();
}
 
开发者ID:RBGKew,项目名称:eMonocot,代码行数:21,代码来源:SearchableDaoImpl.java

示例2: stripHTML

import org.apache.lucene.analysis.charfilter.HTMLStripCharFilter; //导入依赖的package包/类
private Object stripHTML(String value, String column) {
  StringBuilder out = new StringBuilder();
  StringReader strReader = new StringReader(value);
  try {
    HTMLStripCharFilter html = new HTMLStripCharFilter(strReader.markSupported() ? strReader : new BufferedReader(strReader));
    char[] cbuf = new char[1024 * 10];
    while (true) {
      int count = html.read(cbuf);
      if (count == -1)
        break; // end of stream mark is -1
      if (count > 0)
        out.append(cbuf, 0, count);
    }
    html.close();
  } catch (IOException e) {
    throw new DataImportHandlerException(DataImportHandlerException.SEVERE,
            "Failed stripping HTML for column: " + column, e);
  }
  return out.toString();
}
 
开发者ID:europeana,项目名称:search,代码行数:21,代码来源:HTMLStripTransformer.java

示例3: analyzeReturnTokens

import org.apache.lucene.analysis.charfilter.HTMLStripCharFilter; //导入依赖的package包/类
private String[] analyzeReturnTokens(String docText) {
  List<String> result = new ArrayList<>();

  Reader filter = new HTMLStripCharFilter(new StringReader(docText),
          Collections.singleton("unescaped"));
  WhitespaceTokenizer ts = new WhitespaceTokenizer();
  final CharTermAttribute termAttribute = ts.addAttribute(CharTermAttribute.class);
  try {
    ts.setReader(filter);
    ts.reset();
    while (ts.incrementToken()) {
      result.add(termAttribute.toString());
    }
    ts.end();
  } catch (IOException e) {
    throw new RuntimeException(e);
  } finally {
    IOUtils.closeQuietly(ts);
  }
  return result.toArray(new String[result.size()]);
}
 
开发者ID:OpenSextant,项目名称:SolrTextTagger,代码行数:22,代码来源:XmlInterpolationTest.java

示例4: filterHTML

import org.apache.lucene.analysis.charfilter.HTMLStripCharFilter; //导入依赖的package包/类
public static String filterHTML(Reader source) throws IOException {
    if (source == null) {
        return null;
    }
    StringBuilder builder = new StringBuilder();
    HTMLStripCharFilter reader = new HTMLStripCharFilter(source);
    int ch;
    while ((ch = reader.read()) != -1) {
        builder.append((char) ch);
    }
    return builder.toString();
}
 
开发者ID:smalldirector,项目名称:solr-multilingual-analyzer,代码行数:13,代码来源:HTMLScriptCharFilterHelper.java

示例5: analyzeTagOne

import org.apache.lucene.analysis.charfilter.HTMLStripCharFilter; //导入依赖的package包/类
private int[] analyzeTagOne(String docText, String start, String end) {
  int[] result = {-1, -1};

  Reader filter = new HTMLStripCharFilter(new StringReader(docText));

  WhitespaceTokenizer ts = new WhitespaceTokenizer();
  final CharTermAttribute termAttribute = ts.addAttribute(CharTermAttribute.class);
  final OffsetAttribute offsetAttribute = ts.addAttribute(OffsetAttribute.class);
  try {
    ts.setReader(filter);
    ts.reset();
    while (ts.incrementToken()) {
      final String termString = termAttribute.toString();
      if (termString.equals(start))
        result[0] = offsetAttribute.startOffset();
      if (termString.equals(end)) {
        result[1] = offsetAttribute.endOffset();
        return result;
      }
    }
    ts.end();
  } catch (IOException e) {
    throw new RuntimeException(e);
  } finally {
    IOUtils.closeQuietly(ts);
  }
  return result;
}
 
开发者ID:OpenSextant,项目名称:SolrTextTagger,代码行数:29,代码来源:XmlInterpolationTest.java

示例6: create

import org.apache.lucene.analysis.charfilter.HTMLStripCharFilter; //导入依赖的package包/类
@Override
public Reader create(Reader tokenStream) {
    return new HTMLStripCharFilter(tokenStream, escapedTags);
}
 
开发者ID:justor,项目名称:elasticsearch_my,代码行数:5,代码来源:HtmlStripCharFilterFactory.java


注:本文中的org.apache.lucene.analysis.charfilter.HTMLStripCharFilter类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。