當前位置: 首頁>>代碼示例>>Java>>正文


Java WARCConstants類代碼示例

本文整理匯總了Java中org.archive.io.warc.WARCConstants的典型用法代碼示例。如果您正苦於以下問題:Java WARCConstants類的具體用法?Java WARCConstants怎麽用?Java WARCConstants使用的例子?那麽, 這裏精選的類代碼示例或許可以為您提供幫助。


WARCConstants類屬於org.archive.io.warc包,在下文中一共展示了WARCConstants類的7個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。

示例1: write

import org.archive.io.warc.WARCConstants; //導入依賴的package包/類
@Override
public void write(String uri, String contentType, String hostIP,
        long fetchBeginTimeStamp, byte[] payload)
                throws java.io.IOException {
    
    String create14DigitDate = ArchiveDateConverter.getWarcDateFormat()
            .format(new Date(fetchBeginTimeStamp));
    ByteArrayInputStream in = new ByteArrayInputStream(payload);
    String blockDigest = ChecksumCalculator.calculateSha1(in);
    in = new ByteArrayInputStream(payload); // A re-read is necessary here!
    ANVLRecord namedFields = new ANVLRecord(3);
    namedFields.addLabelValue(
    WARCConstants.HEADER_KEY_BLOCK_DIGEST, "sha1:" + blockDigest);
    namedFields.addLabelValue("WARC-Warcinfo-ID", 
            generateEncapsulatedRecordID(warcInfoUID));
    namedFields.addLabelValue("WARC-IP-Address", SystemUtils.getLocalIP());
    URI recordId;
    try {
        recordId = new URI("urn:uuid:" + UUID.randomUUID().toString());
    } catch (URISyntaxException e) {
        throw new IllegalState("Epic fail creating URI from UUID!");
    }
    writer.writeResourceRecord(uri, create14DigitDate, contentType,
            recordId, namedFields, in, payload.length);
}
 
開發者ID:netarchivesuite,項目名稱:netarchivesuite-svngit-migration,代碼行數:26,代碼來源:MetadataFileWriterWarc.java

示例2: index

import org.archive.io.warc.WARCConstants; //導入依賴的package包/類
/**
 * Create and return the index of the ArcHarvestFile.
 * @param baseDir the base directory of the arcs
 * @throws IOException thrown if there is an error
 * @throws ParseException 
 */
public Map<String, HarvestResourceDTO> index(File baseDir) throws IOException, ParseException {
	Map<String, HarvestResourceDTO> results = new HashMap<String, HarvestResourceDTO>();
	
	File theArchiveFile = new File(baseDir, this.getName());
	ArchiveReader reader = ArchiveReaderFactory.get(theArchiveFile);
	this.compressed = reader.isCompressed();
	
	Iterator<ArchiveRecord> it = reader.iterator();
	while(it.hasNext()) {
		ArchiveRecord rec = it.next();
		
		if(rec instanceof WARCRecord) {
			String type = rec.getHeader().getHeaderValue(WARCConstants.HEADER_KEY_TYPE).toString();
			if(type.equals(WARCConstants.RESPONSE)) {
				String mime = rec.getHeader().getMimetype();
				if(!mime.equals("text/dns")) {
					indexWARCResponse(rec, results);
				}
			}
		}
		else {
			indexARCRecord(rec, results);
		}
	}
	reader.close();
	
	return results;
}
 
開發者ID:DIA-NZ,項目名稱:webcurator,代碼行數:35,代碼來源:ArcHarvestFileDTO.java

示例3: skipHeaders

import org.archive.io.warc.WARCConstants; //導入依賴的package包/類
private void skipHeaders(ArchiveRecord record) throws IOException {
	HttpParser.parseHeaders(record, WARCConstants.DEFAULT_ENCODING);
}
 
開發者ID:DIA-NZ,項目名稱:webcurator,代碼行數:4,代碼來源:ArcDigitalAssetStoreService.java

示例4: getRecordType

import org.archive.io.warc.WARCConstants; //導入依賴的package包/類
/**
 * Find out what type of WARC-record this is.
 * @param record a given WARCRecord
 * @return the type of WARCRecord as a String.
 */
public static String getRecordType(WARCRecord record) {
    ArgumentNotValid.checkNotNull(record, "record");
    ArchiveRecordHeader header = record.getHeader();
    return (String) header.getHeaderValue(WARCConstants.HEADER_KEY_TYPE);
}
 
開發者ID:netarchivesuite,項目名稱:netarchivesuite-svngit-migration,代碼行數:11,代碼來源:WARCUtils.java

示例5: adaptInner

import org.archive.io.warc.WARCConstants; //導入依賴的package包/類
private CaptureSearchResult adaptInner(WARCRecord rec) throws IOException {
		
		ArchiveRecordHeader header = rec.getHeader();

		String type = header.getHeaderValue(WARCConstants.HEADER_KEY_TYPE).toString();
//		if(type.equals(WARCConstants.WARCINFO)) {
//			LOGGER.info("Skipping record type : " + type);
//			return null;
//		}

		CaptureSearchResult result = genericResult(rec);

		if(type.equals(WARCConstants.RESPONSE)) {
			String mime = annotater.transformHTTPMime(header.getMimetype());
			if(mime != null && mime.equals("text/dns")) {
				// close to complete reading, then the digest is legit
				// TODO: DO we want to use the WARC header digest for this?
				rec.close();
				result.setDigest(transformWARCDigest(rec.getDigestStr()));
				result.setMimeType(mime);
			} else {
				result = adaptWARCHTTPResponse(result,rec);
			}
		} else if(type.equals(WARCConstants.REVISIT)) {
			// also set the mime type:
			result.setMimeType("warc/revisit");

		} else if(type.equals(WARCConstants.REQUEST)) {
			
			if(processAll) {
				// also set the mime type:
				result.setMimeType("warc/request");
			} else {
				result = null;
			}
		} else if(type.equals(WARCConstants.METADATA)) {

			if(processAll) {
				// also set the mime type:
				result.setMimeType("warc/metadata");
			} else {
				result = null;
			}
		} else if(type.equals(WARCConstants.WARCINFO)) {

			result.setMimeType(WARC_FILEDESC_VERSION);

		} else {
			LOGGER.info("Skipping record type : " + type);
		}

		return result;
	}
 
開發者ID:netarchivesuite,項目名稱:netarchivesuite-svngit-migration,代碼行數:54,代碼來源:NetarchiveSuiteWARCRecordToSearchResultAdapter.java

示例6: genericResult

import org.archive.io.warc.WARCConstants; //導入依賴的package包/類
private CaptureSearchResult genericResult(WARCRecord rec) {

		CaptureSearchResult result = new CaptureSearchResult();

		result.setMimeType(DEFAULT_VALUE);
		result.setHttpCode(DEFAULT_VALUE);
		result.setRedirectUrl(DEFAULT_VALUE);

		ArchiveRecordHeader header = rec.getHeader();

		String file = transformWARCFilename(header.getReaderIdentifier());
		long offset = header.getOffset();
		
		result.setCaptureTimestamp(transformWARCDate(header.getDate()));
		result.setFile(file);
		result.setOffset(offset);
		result.setDigest(transformWARCDigest(header.getHeaderValue(
				WARCRecord.HEADER_KEY_PAYLOAD_DIGEST)));
		
		String origUrl = header.getUrl();
		if(origUrl == null) {
			String type = header.getHeaderValue(WARCConstants.HEADER_KEY_TYPE).toString();
			if(type.equals(WARCConstants.WARCINFO)) {
				String filename = header.getHeaderValue(
						WARCConstants.HEADER_KEY_FILENAME).toString();
				result.setOriginalUrl("filedesc:"+filename);
				result.setUrlKey("filedesc:"+filename);				
			} else {
				result.setOriginalUrl(DEFAULT_VALUE);
				result.setUrlKey(DEFAULT_VALUE);
			}

			
		} else {
			result.setOriginalUrl(origUrl);
			try {
				String urlKey = canonicalizer.urlStringToKey(origUrl);
				result.setUrlKey(urlKey);
			} catch (URIException e) {
				String shortUrl = 
					(origUrl.length() < 100) 
					? origUrl
					:origUrl.substring(0,100);
				LOGGER.warning("FAILED canonicalize(" + shortUrl + "):" + 
						file + " " + offset);
				result.setUrlKey(origUrl);
			}
		}
		return result;
	}
 
開發者ID:netarchivesuite,項目名稱:netarchivesuite-svngit-migration,代碼行數:51,代碼來源:NetarchiveSuiteWARCRecordToSearchResultAdapter.java

示例7: getRecordType

import org.archive.io.warc.WARCConstants; //導入依賴的package包/類
/**
 * Find out what type of WARC-record this is.
 * @param record a given WARCRecord
 * @return the type of WARCRecord as a String.
 */
public static String getRecordType(WARCRecord record) {
    ArchiveRecordHeader header = record.getHeader();
    return (String) header.getHeaderValue(WARCConstants.HEADER_KEY_TYPE);
}
 
開發者ID:netarchivesuite,項目名稱:netarchivesuite-svngit-migration,代碼行數:10,代碼來源:WARCUtilsInTest.java


注:本文中的org.archive.io.warc.WARCConstants類示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。