當前位置: 首頁>>代碼示例>>Java>>正文


Java WARCRecordInfo.getRecordId方法代碼示例

本文整理匯總了Java中org.archive.io.warc.WARCRecordInfo.getRecordId方法的典型用法代碼示例。如果您正苦於以下問題:Java WARCRecordInfo.getRecordId方法的具體用法?Java WARCRecordInfo.getRecordId怎麽用?Java WARCRecordInfo.getRecordId使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在org.archive.io.warc.WARCRecordInfo的用法示例。


在下文中一共展示了WARCRecordInfo.getRecordId方法的2個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。

示例1: writeRequest

import org.archive.io.warc.WARCRecordInfo; //導入方法依賴的package包/類
protected URI writeRequest(URI id) throws IOException, ParseException {
  WARCRecordInfo record = new WARCRecordInfo();

  record.setType(WARCConstants.WARCRecordType.request);
  record.setUrl(getUrl());
  record.setCreate14DigitDate(DateUtils
      .getLog14Date(Long.parseLong(metadata.get("nutch.fetch.time"))));
  record.setMimetype(WARCConstants.HTTP_REQUEST_MIMETYPE);
  record.setRecordId(GENERATOR.getRecordID());

  if (id != null) {
    ANVLRecord headers = new ANVLRecord();
    headers.addLabelValue(WARCConstants.HEADER_KEY_CONCURRENT_TO,
        '<' + id.toString() + '>');
    record.setExtraHeaders(headers);
  }

  ByteArrayOutputStream output = new ByteArrayOutputStream();

  output.write(metadata.get("_request_").getBytes());
  record.setContentLength(output.size());
  record.setContentStream(new ByteArrayInputStream(output.toByteArray()));

  writer.writeRecord(record);

  return record.getRecordId();
}
 
開發者ID:jorcox,項目名稱:GeoCrawler,代碼行數:28,代碼來源:CommonCrawlFormatWARC.java

示例2: writeResponse

import org.archive.io.warc.WARCRecordInfo; //導入方法依賴的package包/類
protected URI writeResponse() throws IOException, ParseException {
  WARCRecordInfo record = new WARCRecordInfo();

  record.setType(WARCConstants.WARCRecordType.response);
  record.setUrl(getUrl());

  String fetchTime;

  record.setCreate14DigitDate(DateUtils
      .getLog14Date(Long.parseLong(metadata.get("nutch.fetch.time"))));
  record.setMimetype(WARCConstants.HTTP_RESPONSE_MIMETYPE);
  record.setRecordId(GENERATOR.getRecordID());

  String IP = getResponseAddress();

  if (StringUtils.isNotBlank(IP))
    record.addExtraHeader(WARCConstants.HEADER_KEY_IP, IP);

  if (ParseSegment.isTruncated(content))
    record.addExtraHeader(WARCConstants.HEADER_KEY_TRUNCATED, "unspecified");

  ByteArrayOutputStream output = new ByteArrayOutputStream();

  String httpHeaders = metadata.get("_response.headers_");

  if (StringUtils.isNotBlank(httpHeaders)) {
    output.write(httpHeaders.getBytes());
  } else {
    // change the record type to resource as we not have information about
    // the headers
    record.setType(WARCConstants.WARCRecordType.resource);
    record.setMimetype(content.getContentType());
  }

  output.write(getResponseContent().getBytes());

  record.setContentLength(output.size());
  record.setContentStream(new ByteArrayInputStream(output.toByteArray()));

  if (output.size() > 0) {
    // avoid generating a 0 sized record, as the webarchive library will
    // complain about it
    writer.writeRecord(record);
  }

  return record.getRecordId();
}
 
開發者ID:jorcox,項目名稱:GeoCrawler,代碼行數:48,代碼來源:CommonCrawlFormatWARC.java


注:本文中的org.archive.io.warc.WARCRecordInfo.getRecordId方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。