本文整理汇总了Java中org.carrot2.text.analysis.ExtendedWhitespaceTokenizer类的典型用法代码示例。如果您正苦于以下问题:Java ExtendedWhitespaceTokenizer类的具体用法?Java ExtendedWhitespaceTokenizer怎么用?Java ExtendedWhitespaceTokenizer使用的例子?那么, 这里精选的类代码示例或许可以为您提供帮助。
ExtendedWhitespaceTokenizer类属于org.carrot2.text.analysis包,在下文中一共展示了ExtendedWhitespaceTokenizer类的4个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。
示例1: getTokenizer
import org.carrot2.text.analysis.ExtendedWhitespaceTokenizer; //导入依赖的package包/类
@Override
public ITokenizer getTokenizer(LanguageCode language) {
switch (language) {
case CHINESE_SIMPLIFIED:
return ChineseTokenizerFactory.createTokenizer();
/*
* We use our own analyzer for Arabic. Lucene's version has special
* support for Nonspacing-Mark characters (see
* http://www.fileformat.info/info/unicode/category/Mn/index.htm), but we
* have them included as letters in the parser.
*/
case ARABIC:
// Intentional fall-through.
default:
return new ExtendedWhitespaceTokenizer();
}
}
示例2: getTokenizer
import org.carrot2.text.analysis.ExtendedWhitespaceTokenizer; //导入依赖的package包/类
@Override
public ITokenizer getTokenizer(LanguageCode language) {
return new ITokenizer() {
private final ExtendedWhitespaceTokenizer delegate = new ExtendedWhitespaceTokenizer();
@Override
public void setTermBuffer(MutableCharArray buffer) {
delegate.setTermBuffer(buffer);
buffer.reset(buffer.toString() + buffer.toString());
}
@Override
public void reset(Reader input) {
delegate.reset(input);
}
@Override
public short nextToken() throws IOException {
return delegate.nextToken();
}
};
}
示例3: createTokenizer
import org.carrot2.text.analysis.ExtendedWhitespaceTokenizer; //导入依赖的package包/类
static ITokenizer createTokenizer() {
try {
return new ChineseTokenizer();
} catch (Throwable e) {
if (e instanceof OutOfMemoryError) {
throw (OutOfMemoryError) e;
}
return new ExtendedWhitespaceTokenizer();
}
}
示例4: createTokenizer
import org.carrot2.text.analysis.ExtendedWhitespaceTokenizer; //导入依赖的package包/类
static ITokenizer createTokenizer() {
try {
return new ChineseTokenizer();
} catch (Throwable e) {
return new ExtendedWhitespaceTokenizer();
}
}