当前位置: 首页>>代码示例>>Java>>正文


Java DocumentSource类代码示例

本文整理汇总了Java中org.apache.any23.source.DocumentSource的典型用法代码示例。如果您正苦于以下问题:Java DocumentSource类的具体用法?Java DocumentSource怎么用?Java DocumentSource使用的例子?那么, 这里精选的类代码示例或许可以为您提供帮助。


DocumentSource类属于org.apache.any23.source包,在下文中一共展示了DocumentSource类的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: _extractTopicsFrom

import org.apache.any23.source.DocumentSource; //导入依赖的package包/类
@Override
public boolean _extractTopicsFrom(URL url, TopicMap topicMap) throws Exception {
    if(url != null) {
        tripletSource = url.toExternalForm();
        Any23 runner = new Any23();

        runner.setHTTPUserAgent("Wandora ANY23 Extractor");
        HTTPClient httpClient = runner.getHTTPClient();
        DocumentSource source = new HTTPDocumentSource(
            httpClient,
            url.toExternalForm()
        );
        namespace = url.toExternalForm();
        TripleHandler handler = new TopicMapsCreator(topicMap);
        runner.extract(source, handler);
    }
    tripletSource = null;
    return true;
}
 
开发者ID:wandora-team,项目名称:wandora,代码行数:20,代码来源:Any23Extractor.java

示例2: run

import org.apache.any23.source.DocumentSource; //导入依赖的package包/类
private static void run(String uri, String outputDir, String outputFormat) throws IOException, URISyntaxException, ExtractionException {
  Any23 runner = new Any23();
  runner.setHTTPUserAgent("Eurosentiment Crawler");
  HTTPClient httpClient = runner.getHTTPClient();
  DocumentSource source = new HTTPDocumentSource(
      httpClient,
      uri
      );
  ByteArrayOutputStream out = new ByteArrayOutputStream();
  TripleHandler handler = null;
  if (outputFormat != null) {
    switch (outputFormat) {
    case "turtle":
      handler = new TurtleWriter(out);
      break;
    case "ntriples":
      handler = new NTriplesWriter(out);
      break;
    case "rdfxml":
      handler = new RDFXMLWriter(out);
      break;
    case "nquads":
      handler = new NQuadsWriter(out);
      break;
    case "trix":
      handler = new TriXWriter(out);
      break;
    case "json":
      handler = new JSONWriter(out);
      break;
    default:
      System.out.println("No output writer found for type: " + outputFormat);
      System.out.println("Defaulting to Turtle output serialization");
      handler = new TurtleWriter(out);
      break;
    }
    System.out.println("Selected " + handler.getClass().getSimpleName() + " as output writer.");
  }
  try {
    runner.extract(source, handler);
  } finally {
    try {
      handler.close();
    } catch (TripleHandlerException e) {
      e.printStackTrace();
    }
  }
  if (outputDir != null) {
    FileUtils.writeStringToFile(new File(outputDir + "/sentiment.txt"), out.toString("UTF-8"));
    System.out.println("Successfully wrote file to: " + outputDir + "/sentiment.txt");
  } else {
    FileUtils.writeStringToFile(new File("sentiment.txt"), out.toString("UTF-8"));
    System.out.println("Successfully wrote file to sentiment.txt");
  }
}
 
开发者ID:eurocent,项目名称:sentimentCrawler,代码行数:56,代码来源:Runner.java


注:本文中的org.apache.any23.source.DocumentSource类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。