当前位置: 首页>>代码示例>>Java>>正文


Java LengthFilter类代码示例

本文整理汇总了Java中org.apache.lucene.analysis.miscellaneous.LengthFilter的典型用法代码示例。如果您正苦于以下问题:Java LengthFilter类的具体用法?Java LengthFilter怎么用?Java LengthFilter使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。


LengthFilter类属于org.apache.lucene.analysis.miscellaneous包,在下文中一共展示了LengthFilter类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: create

import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
@Override
public TokenStream create(TokenStream tokenStream) {
    if (version.onOrAfter(Version.LUCENE_4_4)) {
        return new LengthFilter(tokenStream, min, max);
    } else {
        @SuppressWarnings("deprecation")
        final TokenStream filter = new Lucene43LengthFilter(enablePositionIncrements, tokenStream, min, max);
        return filter;
    }
}
 
开发者ID:baidu,项目名称:Elasticsearch,代码行数:11,代码来源:LengthTokenFilterFactory.java

示例2: transform

import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
public Tuple2<Double, Multiset<String>> transform(Row row) throws IOException {
	Double label = row.getDouble(1);
	StringReader document = new StringReader(row.getString(0).replaceAll("br2n", ""));
	List<String> wordsList = new ArrayList<>();

	try (BulgarianAnalyzer analyzer = new BulgarianAnalyzer(BULGARIAN_STOP_WORDS_SET)) {
		TokenStream stream = analyzer.tokenStream("words", document);

		TokenFilter lowerFilter = new LowerCaseFilter(stream);
		TokenFilter numbers = new NumberFilter(lowerFilter);
		TokenFilter length = new LengthFilter(numbers, 3, 1000);
		TokenFilter stemmer = new BulgarianStemFilter(length);
		TokenFilter ngrams = new ShingleFilter(stemmer, 2, 3);

		try (TokenFilter filter = ngrams) {
			Attribute termAtt = filter.addAttribute(CharTermAttribute.class);
			filter.reset();
			while (filter.incrementToken()) {
				String word = termAtt.toString().replace(",", "(comma)").replaceAll("\n|\r", "");
				if (word.contains("_")) {
					continue;
				}
				wordsList.add(word);
			}
		}
	}

	Multiset<String> words = ConcurrentHashMultiset.create(wordsList);

	return new Tuple2<>(label, words);
}
 
开发者ID:mhardalov,项目名称:news-credibility,代码行数:32,代码来源:TokenTransform.java

示例3: main

import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
public static void main(String[] args) throws IOException {
	System.out.println(NumberUtils.isDigits("12345"));
	System.out.println(NumberUtils.isDigits("12345.1"));
	System.out.println(NumberUtils.isDigits("12345,2"));
	
	System.out.println(NumberUtils.isNumber("12345"));
	System.out.println(NumberUtils.isNumber("12345.1"));
	System.out.println(NumberUtils.isNumber("12345,2".replace(",", ".")));
	System.out.println(NumberUtils.isNumber("12345,2"));
	StringReader input = new StringReader(
			"Правя тест на класификатор и после др.Дулитъл, пада.br2n ще се оправя с данните! които,са много зле. Но това е по-добре. Но24"
					.replaceAll("br2n", ""));

	LetterTokenizer tokenizer = new LetterTokenizer();
	tokenizer.setReader(input);

	TokenFilter stopFilter = new StopFilter(tokenizer, BULGARIAN_STOP_WORDS_SET);
	TokenFilter length = new LengthFilter(stopFilter, 3, 1000);
	TokenFilter stemmer = new BulgarianStemFilter(length);
	TokenFilter ngrams = new ShingleFilter(stemmer, 2, 2);

	try (TokenFilter filter = ngrams) {

		Attribute termAtt = filter.addAttribute(CharTermAttribute.class);
		filter.reset();
		while (filter.incrementToken()) {
			String word = termAtt.toString().replaceAll(",", "\\.").replaceAll("\n|\r", "");
			System.out.println(word);
		}
	}
}
 
开发者ID:mhardalov,项目名称:news-credibility,代码行数:32,代码来源:EgdeMain.java

示例4: create

import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
@Override
public TokenStream create(TokenStream tokenStream) {
    return new LengthFilter(tokenStream, min, max);
}
 
开发者ID:justor,项目名称:elasticsearch_my,代码行数:5,代码来源:LengthTokenFilterFactory.java

示例5: wrapComponents

import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
@Override
protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
	TokenStream ts = components.getTokenStream();
	LengthFilter drop_long_tokens = new LengthFilter(ts, 0, 1024);
	return new TokenStreamComponents(components.getTokenizer(), drop_long_tokens);
}
 
开发者ID:isoboroff,项目名称:basekb-search,代码行数:7,代码来源:SafetyAnalyzer.java

示例6: create

import org.apache.lucene.analysis.miscellaneous.LengthFilter; //导入依赖的package包/类
@Override
public LengthFilter create(TokenStream input) {
  return new LengthFilter(enablePositionIncrements, input,min,max);
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:5,代码来源:LengthFilterFactory.java


注:本文中的org.apache.lucene.analysis.miscellaneous.LengthFilter类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。