本文整理汇总了Java中org.apache.nutch.parse.HtmlParseFilter类的典型用法代码示例。如果您正苦于以下问题:Java HtmlParseFilter类的具体用法?Java HtmlParseFilter怎么用?Java HtmlParseFilter使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
HtmlParseFilter类属于org.apache.nutch.parse包,在下文中一共展示了HtmlParseFilter类的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。
示例1: setConf
import org.apache.nutch.parse.HtmlParseFilter; //导入依赖的package包/类
public void setConf(Configuration conf) {
this.conf = conf;
// get the extensions for domain urlfilter
String pluginName = "parsefilter-regex";
Extension[] extensions = PluginRepository.get(conf).getExtensionPoint(
HtmlParseFilter.class.getName()).getExtensions();
for (int i = 0; i < extensions.length; i++) {
Extension extension = extensions[i];
if (extension.getDescriptor().getPluginId().equals(pluginName)) {
attributeFile = extension.getAttribute("file");
break;
}
}
// handle blank non empty input
if (attributeFile != null && attributeFile.trim().equals("")) {
attributeFile = null;
}
if (attributeFile != null) {
if (LOG.isInfoEnabled()) {
LOG.info("Attribute \"file\" is defined for plugin " + pluginName
+ " as " + attributeFile);
}
}
else {
if (LOG.isWarnEnabled()) {
LOG.warn("Attribute \"file\" is not defined in plugin.xml for plugin "
+ pluginName);
}
}
// domain file and attribute "file" take precedence if defined
String file = conf.get("parsefilter.regex.file");
String stringRules = conf.get("parsefilter.regex.rules");
if (regexFile != null) {
file = regexFile;
}
else if (attributeFile != null) {
file = attributeFile;
}
Reader reader = null;
if (stringRules != null) { // takes precedence over files
reader = new StringReader(stringRules);
} else {
reader = conf.getConfResourceAsReader(file);
}
try {
if (reader == null) {
reader = new FileReader(file);
}
readConfiguration(reader);
}
catch (IOException e) {
LOG.error(org.apache.hadoop.util.StringUtils.stringifyException(e));
}
}