当前位置: 首页>>代码示例>>Java>>正文


Java HtmlParseFilter类代码示例

本文整理汇总了Java中org.apache.nutch.parse.HtmlParseFilter的典型用法代码示例。如果您正苦于以下问题:Java HtmlParseFilter类的具体用法?Java HtmlParseFilter怎么用?Java HtmlParseFilter使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。


HtmlParseFilter类属于org.apache.nutch.parse包,在下文中一共展示了HtmlParseFilter类的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: setConf

import org.apache.nutch.parse.HtmlParseFilter; //导入依赖的package包/类
public void setConf(Configuration conf) {
  this.conf = conf;

  // get the extensions for domain urlfilter
  String pluginName = "parsefilter-regex";
  Extension[] extensions = PluginRepository.get(conf).getExtensionPoint(
    HtmlParseFilter.class.getName()).getExtensions();
  for (int i = 0; i < extensions.length; i++) {
    Extension extension = extensions[i];
    if (extension.getDescriptor().getPluginId().equals(pluginName)) {
      attributeFile = extension.getAttribute("file");
      break;
    }
  }

  // handle blank non empty input
  if (attributeFile != null && attributeFile.trim().equals("")) {
    attributeFile = null;
  }

  if (attributeFile != null) {
    if (LOG.isInfoEnabled()) {
      LOG.info("Attribute \"file\" is defined for plugin " + pluginName
        + " as " + attributeFile);
    }
  }
  else {
    if (LOG.isWarnEnabled()) {
      LOG.warn("Attribute \"file\" is not defined in plugin.xml for plugin "
        + pluginName);
    }
  }

  // domain file and attribute "file" take precedence if defined
  String file = conf.get("parsefilter.regex.file");
  String stringRules = conf.get("parsefilter.regex.rules");
  if (regexFile != null) {
    file = regexFile;
  }
  else if (attributeFile != null) {
    file = attributeFile;
  }
  Reader reader = null;
  if (stringRules != null) { // takes precedence over files
    reader = new StringReader(stringRules);
  } else {
    reader = conf.getConfResourceAsReader(file);
  }
  try {
    if (reader == null) {
      reader = new FileReader(file);
    }
    readConfiguration(reader);
  }
  catch (IOException e) {
    LOG.error(org.apache.hadoop.util.StringUtils.stringifyException(e));
  }
}
 
开发者ID:jorcox,项目名称:GeoCrawler,代码行数:59,代码来源:RegexParseFilter.java


注:本文中的org.apache.nutch.parse.HtmlParseFilter类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。