当前位置: 首页>>代码示例>>Java>>正文


Java AbstractSiteMap类代码示例

本文整理汇总了Java中crawlercommons.sitemaps.AbstractSiteMap的典型用法代码示例。如果您正苦于以下问题:Java AbstractSiteMap类的具体用法?Java AbstractSiteMap怎么用?Java AbstractSiteMap使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。


AbstractSiteMap类属于crawlercommons.sitemaps包,在下文中一共展示了AbstractSiteMap类的5个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: characters

import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
    String localName = super.currentElement();
    String value = String.valueOf(ch, start, length);
    if ("pubDate".equals(localName)) {
        lastMod = AbstractSiteMap.normalizeRSSTimestamp(value);
        if ("channel".equals(super.currentElementParent())) {
            sitemap.setLastModified(lastMod);
        }
    } else if ("link".equals(localName)) {
        String href = value;
        LOG.debug("href = {}", href);
        try {
            loc = new URL(href);
            valid = urlIsValid(sitemap.getBaseUrl(), href);
        } catch (MalformedURLException e) {
            LOG.trace("Can't create an entry with a bad URL", e);
            LOG.debug("Bad url: [{}]", href);
        }
    }
}
 
开发者ID:crawler-commons,项目名称:crawler-commons,代码行数:22,代码来源:RSSHandler.java

示例2: getSitemapsForUrl

import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
public List<AbstractSiteMap> getSitemapsForUrl(String sitemapUrl) {
	List<AbstractSiteMap> sitemaps = new ArrayList<>();

	SiteMapParser siteMapParser = new SiteMapParser();
	try {
		Uri uri = Uri.create(sitemapUrl);
		Blob blob = Requesters.of(uri.getScheme()).get().get(uri);
		String contentType = blob.getMetadata().getContentMetadata().contentType() != null ? blob.getMetadata().getContentMetadata().contentType()
				: "text/xml";

		AbstractSiteMap sitemap = siteMapParser.parseSiteMap(contentType, IOUtils.toByteArray(blob.getPayload().openStream()), new URL(sitemapUrl));

		if (sitemap.isIndex()) {
			sitemaps.addAll(((SiteMapIndex) sitemap).getSitemaps());
		} else {
			sitemaps.add(sitemap);
		}
	} catch (Exception e) {
		log.debug("", e);
	}
	return sitemaps;
}
 
开发者ID:Treydone,项目名称:mandrel,代码行数:23,代码来源:AnalysisService.java

示例3: getSiteMap

import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
public AbstractSiteMap getSiteMap() {
    return sitemap;
}
 
开发者ID:crawler-commons,项目名称:crawler-commons,代码行数:4,代码来源:RSSHandler.java

示例4: getSiteMap

import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
public AbstractSiteMap getSiteMap() {
    if (delegate == null)
        return null;
    return delegate.getSiteMap();
}
 
开发者ID:crawler-commons,项目名称:crawler-commons,代码行数:6,代码来源:DelegatorHandler.java

示例5: buildReport

import crawlercommons.sitemaps.AbstractSiteMap; //导入依赖的package包/类
protected Analysis buildReport(Job job, Blob blob) {
	Analysis report;
	if (blob.getMetadata().getUri().getScheme().startsWith("http")) {
		HttpAnalysis temp = new HttpAnalysis();

		// Robots.txt
		Uri pageURL = blob.getMetadata().getUri();
		String robotsTxtUrl = pageURL.getScheme() + "://" + pageURL.getHost() + ":" + pageURL.getPort() + "/robots.txt";
		ExtendedRobotRules robotRules = RobotsTxtUtils.getRobotRules(robotsTxtUrl);
		temp.robotRules(robotRules);

		// Sitemaps
		if (robotRules != null && robotRules.getSitemaps() != null) {
			Map<String, List<AbstractSiteMap>> sitemaps = new HashMap<>();
			robotRules.getSitemaps().forEach(url -> {
				List<AbstractSiteMap> results = getSitemapsForUrl(url);
				sitemaps.put(url, results);
			});
			temp.sitemaps(sitemaps);
		}
		report = temp;
	} else {
		report = new Analysis();
	}

	if (job.getDefinition().getExtractors() != null) {
		Map<String, Instance<?>> cachedSelectors = new HashMap<>();

		// Page extraction
		if (job.getDefinition().getExtractors().getData() != null) {
			Map<String, List<Document>> documentsByExtractor = job.getDefinition().getExtractors().getData().stream()
					.map(ex -> Pair.of(ex.getName(), extractorService.extractThenFormat(cachedSelectors, blob, ex)))
					.filter(pair -> pair != null && pair.getKey() != null && pair.getValue() != null)
					.collect(Collectors.toMap(key -> key.getLeft(), value -> value.getRight()));
			report.documents(documentsByExtractor);
		}

		// Link extraction
		if (job.getDefinition().getExtractors().getOutlinks() != null) {
			Map<String, Pair<Set<Link>, Set<Link>>> outlinksByExtractor = job.getDefinition().getExtractors().getOutlinks().stream().map(ol -> {
				return Pair.of(ol.getName(), extractorService.extractAndFilterOutlinks(job, blob.getMetadata().getUri(), cachedSelectors, blob, ol));
			}).collect(Collectors.toMap(key -> key.getLeft(), value -> value.getRight()));

			report.outlinks(Maps.transformEntries(outlinksByExtractor, (key, entries) -> entries.getLeft()));
			report.filteredOutlinks(Maps.transformEntries(outlinksByExtractor, (key, entries) -> entries.getRight()));
		}

	}

	report.metadata(blob.getMetadata());
	return report;
}
 
开发者ID:Treydone,项目名称:mandrel,代码行数:53,代码来源:AnalysisService.java


注:本文中的crawlercommons.sitemaps.AbstractSiteMap类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。