当前位置: 首页>>代码示例>>Java>>正文


Java CleanerProperties.setOmitDoctypeDeclaration方法代码示例

本文整理汇总了Java中org.htmlcleaner.CleanerProperties.setOmitDoctypeDeclaration方法的典型用法代码示例。如果您正苦于以下问题:Java CleanerProperties.setOmitDoctypeDeclaration方法的具体用法?Java CleanerProperties.setOmitDoctypeDeclaration怎么用?Java CleanerProperties.setOmitDoctypeDeclaration使用的例子?那么, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在org.htmlcleaner.CleanerProperties的用法示例。


在下文中一共展示了CleanerProperties.setOmitDoctypeDeclaration方法的9个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: createHtmlCleaner

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
private static HtmlCleaner createHtmlCleaner() {
    HtmlCleaner result = new HtmlCleaner();
    CleanerProperties cleanerProperties = result.getProperties();

    cleanerProperties.setAdvancedXmlEscape(true);

    cleanerProperties.setOmitXmlDeclaration(true);
    cleanerProperties.setOmitDoctypeDeclaration(false);

    cleanerProperties.setTranslateSpecialEntities(true);
    cleanerProperties.setTransResCharsToNCR(true);
    cleanerProperties.setRecognizeUnicodeChars(true);

    cleanerProperties.setIgnoreQuestAndExclam(true);
    cleanerProperties.setUseEmptyElementTags(false);

    cleanerProperties.setPruneTags("script,title");

    return result;
}
 
开发者ID:SysdataSpA,项目名称:SDHtmlTextView,代码行数:21,代码来源:HtmlSpanner.java

示例2: htmlOutputStreamViaHtmlCleaner

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
/**
 * To Output html Stream via Html Cleaner.
 * 
 * @param pathOfHOCRFile String
 * @param outputFilePath String
 * @throws IOException
 */
public static void htmlOutputStreamViaHtmlCleaner(String pathOfHOCRFile, String outputFilePath) throws IOException {
	CleanerProperties cleanerProps = new CleanerProperties();

	// set some properties to non-default values
	cleanerProps.setTransResCharsToNCR(true);
	cleanerProps.setTranslateSpecialEntities(true);
	cleanerProps.setOmitComments(true);
	cleanerProps.setOmitDoctypeDeclaration(true);
	cleanerProps.setOmitXmlDeclaration(false);
	HtmlCleaner cleaner = new HtmlCleaner(cleanerProps);

	// take default cleaner properties
	// CleanerProperties props = cleaner.getProperties();
	FileInputStream hOCRFileInputStream = new FileInputStream(pathOfHOCRFile);
	TagNode tagNode = cleaner.clean(hOCRFileInputStream, UTF_ENCODING);
	if (null != hOCRFileInputStream) {
		hOCRFileInputStream.close();
	}
	try {
		new PrettyHtmlSerializer(cleanerProps).writeToFile(tagNode, outputFilePath, UTF_ENCODING);
	} catch (Exception e) { // NOPMD.
	}
}
 
开发者ID:kuzavas,项目名称:ephesoft,代码行数:31,代码来源:XMLUtil.java

示例3: getSerialized

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
/**
 * Convenience method (for xml/xhtml): serializes the parsed page.
 *
 * @param inSerializer
 *            {@link XmlSerializer}
 * @return String the cleaned and serialized html
 * @throws IOException
 */
public String getSerialized(final XmlSerializer inSerializer)
        throws IOException {
	if (docNode == null) {
		return ""; //$NON-NLS-1$
	}

	final CleanerProperties lProps = new HtmlCleaner().getProperties();
	lProps.setUseCdataForScriptAndStyle(true);
	lProps.setRecognizeUnicodeChars(true);
	lProps.setUseEmptyElementTags(true);
	lProps.setAdvancedXmlEscape(true);
	lProps.setTranslateSpecialEntities(true);
	lProps.setBooleanAttributeValues("empty"); //$NON-NLS-1$
	lProps.setNamespacesAware(true);
	lProps.setOmitXmlDeclaration(false);
	lProps.setOmitDoctypeDeclaration(true);
	lProps.setOmitHtmlEnvelope(false);

	docNode.getAttributes().remove(NS_XML);

	return inSerializer.getSerializer(lProps).getXmlAsString(docNode);
}
 
开发者ID:aktion-hip,项目名称:relations,代码行数:31,代码来源:XPathHelper.java

示例4: createCleanerProperties

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
private static CleanerProperties createCleanerProperties() {
    CleanerProperties properties = new CleanerProperties();

    // See http://htmlcleaner.sourceforge.net/parameters.php for descriptions
    properties.setNamespacesAware(false);
    properties.setAdvancedXmlEscape(false);
    properties.setOmitXmlDeclaration(true);
    properties.setOmitDoctypeDeclaration(false);
    properties.setTranslateSpecialEntities(false);
    properties.setRecognizeUnicodeChars(false);
    properties.setIgnoreQuestAndExclam(false);
    properties.setAllowHtmlInsideAttributes(true);

    return properties;
}
 
开发者ID:scoute-dich,项目名称:K9-MailClient,代码行数:16,代码来源:HtmlSanitizer.java

示例5: getTextFromHtmlString

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
/**
 * This method extracts the text from html string.
 * @param htmlString {@link String}
 * @return {@link String}
 */
public static String getTextFromHtmlString(String htmlString) {
	String errorText = "";
	CleanerProperties cleanerProps = new CleanerProperties();
	// set some properties to non-default values
	cleanerProps.setTransResCharsToNCR(true);
	cleanerProps.setTranslateSpecialEntities(true);
	cleanerProps.setOmitComments(true);
	cleanerProps.setOmitDoctypeDeclaration(true);
	cleanerProps.setOmitXmlDeclaration(true);
	cleanerProps.setUseEmptyElementTags(true);

	HtmlCleaner cleaner = new HtmlCleaner(cleanerProps);
	TagNode tagNode = cleaner.clean(htmlString);
	Object[] rootNode = null;
	try {
		rootNode = tagNode.evaluateXPath("//table");
		if (null != rootNode && rootNode.length > 0) {
			TagNode[] textNode = ((TagNode) rootNode[rootNode.length - 1]).getElementsByName("td", true);
			for (TagNode tag : textNode) {
				if (tag != null && tag.getText() != null) {
					StringBuilder errorTextString = new StringBuilder();
					errorTextString.append(errorText);
					if (tag.getText().toString().trim().equals(" ")) {
						errorTextString.append(" ");
						errorText = errorTextString.toString();
					} else {
						errorTextString.append(tag.getText());
						errorText = errorTextString.toString();
					}
				}
			}
		}
	} catch (XPatherException e) {
		LOGGER.error("Error extracting table node from html." + e.getMessage());
	}
	return errorText;
}
 
开发者ID:kuzavas,项目名称:ephesoft,代码行数:43,代码来源:AbstractUploadFile.java

示例6: createHtmlCleaner

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
private static HtmlCleaner createHtmlCleaner() {
    HtmlCleaner result = new HtmlCleaner();
    CleanerProperties cleanerProperties = result.getProperties();
    cleanerProperties.setOmitXmlDeclaration(true);
    cleanerProperties.setOmitDoctypeDeclaration(false);
    cleanerProperties.setRecognizeUnicodeChars(true);
    cleanerProperties.setTranslateSpecialEntities(false);
    cleanerProperties.setIgnoreQuestAndExclam(true);
    cleanerProperties.setUseEmptyElementTags(false);
    return result;
}
 
开发者ID:DASAR,项目名称:epublib-android,代码行数:12,代码来源:HtmlCleanerBookProcessor.java

示例7: stripSignatureForHtmlMessage

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
public static String stripSignatureForHtmlMessage(String content) {
    Matcher dashSignatureHtml = DASH_SIGNATURE_HTML.matcher(content);
    if (dashSignatureHtml.find()) {
        Matcher blockquoteStart = BLOCKQUOTE_START.matcher(content);
        Matcher blockquoteEnd = BLOCKQUOTE_END.matcher(content);
        List<Integer> start = new ArrayList<>();
        List<Integer> end = new ArrayList<>();

        while (blockquoteStart.find()) {
            start.add(blockquoteStart.start());
        }
        while (blockquoteEnd.find()) {
            end.add(blockquoteEnd.start());
        }
        if (start.size() != end.size()) {
            Log.d(K9.LOG_TAG, "There are " + start.size() + " <blockquote> tags, but " +
                    end.size() + " </blockquote> tags. Refusing to strip.");
        } else if (start.size() > 0) {
            // Ignore quoted signatures in blockquotes.
            dashSignatureHtml.region(0, start.get(0));
            if (dashSignatureHtml.find()) {
                // before first <blockquote>.
                content = content.substring(0, dashSignatureHtml.start());
            } else {
                for (int i = 0; i < start.size() - 1; i++) {
                    // within blockquotes.
                    if (end.get(i) < start.get(i + 1)) {
                        dashSignatureHtml.region(end.get(i), start.get(i + 1));
                        if (dashSignatureHtml.find()) {
                            content = content.substring(0, dashSignatureHtml.start());
                            break;
                        }
                    }
                }
                if (end.get(end.size() - 1) < content.length()) {
                    // after last </blockquote>.
                    dashSignatureHtml.region(end.get(end.size() - 1), content.length());
                    if (dashSignatureHtml.find()) {
                        content = content.substring(0, dashSignatureHtml.start());
                    }
                }
            }
        } else {
            // No blockquotes found.
            content = content.substring(0, dashSignatureHtml.start());
        }
    }

    // Fix the stripping off of closing tags if a signature was stripped,
    // as well as clean up the HTML of the quoted message.
    HtmlCleaner cleaner = new HtmlCleaner();
    CleanerProperties properties = cleaner.getProperties();

    // see http://htmlcleaner.sourceforge.net/parameters.php for descriptions
    properties.setNamespacesAware(false);
    properties.setAdvancedXmlEscape(false);
    properties.setOmitXmlDeclaration(true);
    properties.setOmitDoctypeDeclaration(false);
    properties.setTranslateSpecialEntities(false);
    properties.setRecognizeUnicodeChars(false);

    TagNode node = cleaner.clean(content);
    SimpleHtmlSerializer htmlSerialized = new SimpleHtmlSerializer(properties);
    content = htmlSerialized.getAsString(node, "UTF8");
    return content;
}
 
开发者ID:scoute-dich,项目名称:K9-MailClient,代码行数:67,代码来源:QuotedMessageHelper.java

示例8: downloadResearchesPages

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
public static void downloadResearchesPages(String destDir,
                                           String sInstitutionName,
                                           TreeMap<String,
                                            TreeMap<String, List<String>>
                                           > treeInstitution) 
{
    try
    {
        CleanerProperties props = new CleanerProperties();

        // set some properties to non-default values
        //props.setTranslateSpecialEntities(true);
        //props.setTransResCharsToNCR(true);
        props.setOmitComments(true);
        props.setOmitXmlDeclaration(true);
        props.setAdvancedXmlEscape(true);
        props.setNamespacesAware(false);
        props.setOmitDoctypeDeclaration(true);
        
        String sUnitOfAssessment_Description = "";
        String sResearchGroupDescription = "";
        String sResearchName = "";
        String sResearchInitials = "";            

        File dirI = new File(destDir + System.getProperty("file.separator") + sInstitutionName.replaceAll("[^a-z^A-Z]","") + System.getProperty("file.separator"));
        if(!dirI.mkdir()) throw new Exception("Cant create " + dirI.getPath());
        else
        for (String keyAssessment_Description : treeInstitution.keySet())
        {
            sUnitOfAssessment_Description = keyAssessment_Description;
                    
            if(sUnitOfAssessment_Description.length() > 20) sUnitOfAssessment_Description = sUnitOfAssessment_Description.substring(0, 20);

            File dirUAD = new File(dirI.getPath() + System.getProperty("file.separator") + sUnitOfAssessment_Description.replaceAll("[^a-z^A-Z]","") + System.getProperty("file.separator"));
            if(!dirUAD.mkdir()) throw new Exception("Cant create " + dirUAD.getPath());
            
                TreeMap<String, List<String>> treeResearchers = treeInstitution.get(keyAssessment_Description);
                
                for (String keyResearcher : treeResearchers.keySet())
                {
                    String sAux = keyResearcher;

                    File dirR = new File(dirUAD.getPath() + System.getProperty("file.separator") + sAux + System.getProperty("file.separator"));
                    if(!dirR.exists())
                    {
                        if(!dirR.mkdir()) throw new Exception("Cant create " + dirR.getPath());
                    }
                    else
                    {
                        LOG.info("Repeated: " + sAux);
                        break;
                    }
                    
                    List<String> lstResearcherWebAddress = treeResearchers.get(keyResearcher);

                    //int iCount = 0;
                    List<String> lstLocalResearcherWebAddress = new ArrayList<String>();
                    for (String url : lstResearcherWebAddress)
                    {
                        byte[] bytes = url.getBytes();         
                        
                        String ext = XMLTags.RESEARCHER_WEB_ADDRESS_ATTR_EXT_VALUE_DEFAULT_HTML;
                        String type = XMLTags.RESEARCHER_WEB_ADDRESS_ATTR_TYPE_VALUE_DEFAULT_CV;
                        
                        String fileDownloaded = ResearchersPagePostProcessor.downloadAndClean(dirR.getAbsolutePath(), type, url, ext, true, true);    
                        if(fileDownloaded != "")                            
                            lstLocalResearcherWebAddress.add(fileDownloaded);                            
                    }            
                    
                    lstResearcherWebAddress.clear();
                    lstResearcherWebAddress.addAll(lstLocalResearcherWebAddress);
            }
        }
    }
    catch(Exception ex)
    {            
        LOG.log(Level.SEVERE, "ERROR: "+ ex.getMessage());
    }
}
 
开发者ID:eduardoguzman,项目名称:sisob-data-extractor,代码行数:80,代码来源:DownloaderResearchersWebPagesTreeFormat.java

示例9: getCleanHtml

import org.htmlcleaner.CleanerProperties; //导入方法依赖的package包/类
/**
 * Clean HTML document and return XML as byte array
 * 
 * @param resourceMap map of resources
 * @param resID unique ID of resource
 * @return clean XHTML document as {@code byte[]}
 * @throws IOException
 */
private byte[] getCleanHtml(PandaSettings pandaSettings, String resID) throws IOException {
    byte[] doc = null;
    // Get local path to file, if null the URL field will be used to
    // retrieve resource
    ResourceInfo resInfo = pandaSettings.getResourceMap().getMap().get(resID);
    String filePath = resInfo.getFilePath();

    // properties for HTML cleaning
    CleanerProperties props = new CleanerProperties();
    // preserve namespace prefixes
    props.setNamespacesAware(true);
    // remove <?TAGNAME....> or <!TAGNAME....>
    props.setIgnoreQuestAndExclam(true);
    // do not split attributes with multiple words
    props.setAllowMultiWordAttributes(true);
    // omits <html> tag
    // props.setOmitHtmlEnvelope(true);
    // omit DTD
    props.setOmitDoctypeDeclaration(true);
    // omit xml declaration
    props.setOmitXmlDeclaration(true);
    // omit comments
    props.setOmitComments(true);
    // omit deprecated tags like <font...>
    props.setOmitDeprecatedTags(true);
    // treat script and style tag contents as CDATA
    props.setUseCdataForScriptAndStyle(true);
    // replace html character in form &#XXXX with real unicode characters
    props.setRecognizeUnicodeChars(true);
    // replace special entities with unicode character
    props.setTranslateSpecialEntities(true);
    // if true do not escape valid xml character sequences
    props.setAdvancedXmlEscape(true);

    // get HTML document, parse HTML
    TagNode tagNode = null;
    if (filePath != null) {
        tagNode = new HtmlCleaner(props).clean(new File(filePath));
    } else {
        // Get online resource
        URL resURL = pandaSettings.getResourceMap().getMap().get(resID).getURL();
        InputStream htmlDoc = getOnlineResource(resURL);
        tagNode = new HtmlCleaner(props).clean(htmlDoc);
    }

    PrettyXmlSerializer pXmlS = new PrettyXmlSerializer(props);
    doc = pXmlS.getAsString(tagNode).getBytes();

    return doc;
}
 
开发者ID:chsatgithub,项目名称:PANDA-DEEPLINKING,代码行数:59,代码来源:DataHtmlResource.java


注:本文中的org.htmlcleaner.CleanerProperties.setOmitDoctypeDeclaration方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。