当前位置: 首页>>代码示例>>Java>>正文


Java BasicPreprocessingPipelineDescriptor类代码示例

本文整理汇总了Java中org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipelineDescriptor的典型用法代码示例。如果您正苦于以下问题:Java BasicPreprocessingPipelineDescriptor类的具体用法?Java BasicPreprocessingPipelineDescriptor怎么用?Java BasicPreprocessingPipelineDescriptor使用的例子?那么, 这里精选的类代码示例或许可以为您提供帮助。


BasicPreprocessingPipelineDescriptor类属于org.carrot2.text.preprocessing.pipeline包,在下文中一共展示了BasicPreprocessingPipelineDescriptor类的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: init

import org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipelineDescriptor; //导入依赖的package包/类
@Override
@SuppressWarnings({ "unchecked", "rawtypes" })
public String init(NamedList config, final SolrCore core) {
  this.core = core;

  String result = super.init(config, core);
  final SolrParams initParams = SolrParams.toSolrParams(config);

  // Initialize Carrot2 controller. Pass initialization attributes, if any.
  HashMap<String, Object> initAttributes = new HashMap<String, Object>();
  extractCarrotAttributes(initParams, initAttributes);

  // Customize the stemmer and tokenizer factories. The implementations we provide here
  // are included in the code base of Solr, so that it's possible to refactor
  // the Lucene APIs the factories rely on if needed.
  // Additionally, we set a custom lexical resource factory for Carrot2 that
  // will use both Carrot2 default stop words as well as stop words from
  // the StopFilter defined on the field.
  final AttributeBuilder attributeBuilder = BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes);
  attributeBuilder.lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class);
  if (!initAttributes.containsKey(BasicPreprocessingPipelineDescriptor.Keys.TOKENIZER_FACTORY)) {
    attributeBuilder.tokenizerFactory(LuceneCarrot2TokenizerFactory.class);
  }
  if (!initAttributes.containsKey(BasicPreprocessingPipelineDescriptor.Keys.STEMMER_FACTORY)) {
    attributeBuilder.stemmerFactory(LuceneCarrot2StemmerFactory.class);
  }

  // Pass the schema to SolrStopwordsCarrot2LexicalDataFactory.
  initAttributes.put("solrIndexSchema", core.getSchema());

  // Customize Carrot2's resource lookup to first look for resources
  // using Solr's resource loader. If that fails, try loading from the classpath.
  DefaultLexicalDataFactoryDescriptor.attributeBuilder(initAttributes).resourceLookup(
    new ResourceLookup(
      // Solr-specific resource loading.
      new SolrResourceLocator(core, initParams),
      // Using the class loader directly because this time we want to omit the prefix
      new ClassLoaderLocator(core.getResourceLoader().getClassLoader())));

  // Carrot2 uses current thread's context class loader to get
  // certain classes (e.g. custom tokenizer/stemmer) at initialization time.
  // To make sure classes from contrib JARs are available,
  // we swap the context class loader for the time of clustering.
  Thread ct = Thread.currentThread();
  ClassLoader prev = ct.getContextClassLoader();
  try {
    ct.setContextClassLoader(core.getResourceLoader().getClassLoader());
    this.controller.init(initAttributes);
  } finally {
    ct.setContextClassLoader(prev);
  }

  SchemaField uniqueField = core.getSchema().getUniqueKeyField();
  if (uniqueField == null) {
    throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, 
        CarrotClusteringEngine.class.getSimpleName() + " requires the schema to have a uniqueKeyField");
  }
  this.idFieldName = uniqueField.getName();

  // Make sure the requested Carrot2 clustering algorithm class is available
  String carrotAlgorithmClassName = initParams.get(CarrotParams.ALGORITHM);
  this.clusteringAlgorithmClass = core.getResourceLoader().findClass(carrotAlgorithmClassName, IClusteringAlgorithm.class);
  return result;
}
 
开发者ID:pkarmstr,项目名称:NYBC,代码行数:65,代码来源:CarrotClusteringEngine.java


注:本文中的org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipelineDescriptor类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。