当前位置: 首页>>代码示例>>Java>>正文


Java OOXMLParser类代码示例

本文整理汇总了Java中org.apache.tika.parser.microsoft.ooxml.OOXMLParser的典型用法代码示例。如果您正苦于以下问题:Java OOXMLParser类的具体用法?Java OOXMLParser怎么用?Java OOXMLParser使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。


OOXMLParser类属于org.apache.tika.parser.microsoft.ooxml包,在下文中一共展示了OOXMLParser类的7个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: convertWordDocumentIntoHtml

import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
/**
 * Converts a .docx document into HTML markup. This code
 * is based on <a href="http://stackoverflow.com/a/9053258/313554">this StackOverflow</a> answer.
 *
 * @param wordDocument  The converted .docx document.
 * @return
 */
public ConvertedDocumentDTO convertWordDocumentIntoHtml(MultipartFile wordDocument) {
    LOGGER.info("Converting word document: {} into HTML", wordDocument.getOriginalFilename());
    try {
        InputStream input = wordDocument.getInputStream();
        Parser parser = new OOXMLParser();

        StringWriter sw = new StringWriter();
        SAXTransformerFactory factory = (SAXTransformerFactory)
                SAXTransformerFactory.newInstance();
        TransformerHandler handler = factory.newTransformerHandler();
        handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, "utf-8");
        handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
        handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
        handler.setResult(new StreamResult(sw));

        Metadata metadata = new Metadata();
        metadata.add(Metadata.CONTENT_TYPE, "text/html;charset=utf-8");
        parser.parse(input, handler, metadata, new ParseContext());
        return new ConvertedDocumentDTO(wordDocument.getOriginalFilename(), sw.toString());
    }
    catch (IOException | SAXException | TransformerException | TikaException ex) {
        LOGGER.error("Conversion failed because an exception was thrown", ex);
        throw new DocumentConversionException(ex.getMessage(), ex);
    }
}
 
开发者ID:Vincit,项目名称:spring-boot-word-to-html-example,代码行数:33,代码来源:WordToHtmlConverter.java

示例2: testSupports

import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
public void testSupports() throws Exception
{
    ArrayList<String> mimeTypes = new ArrayList<String>();
    for (Parser p : new Parser[] {
             new OfficeParser(), new OpenDocumentParser(),
             new Mp3Parser(), new OOXMLParser()
    }) {
       Set<MediaType> mts = p.getSupportedTypes(new ParseContext());
       for (MediaType mt : mts) 
       {
          mimeTypes.add(mt.toString());
       }
    }
    
    for (String mimetype : mimeTypes)
    {
        boolean supports = extracter.isSupported(mimetype);
        assertTrue("Mimetype should be supported: " + mimetype, supports);
    }
}
 
开发者ID:Alfresco,项目名称:alfresco-repository,代码行数:21,代码来源:TikaAutoMetadataExtracterTest.java

示例3: readXlsx

import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
public static ExcelData readXlsx(String xlsxFilePath)
    throws IOException, InvalidFormatException, XmlException, TikaException, SAXException {
  BodyContentHandler bcHandler = new BodyContentHandler();
  Metadata metadata = new Metadata();
  FileInputStream inputStream = new FileInputStream(new File(xlsxFilePath));
  ParseContext pcontext = new ParseContext();
  OOXMLParser parser = new OOXMLParser();
  parser.parse(inputStream, bcHandler, metadata, pcontext);
  if (DEBUG_PRINT_META_DATA) {
    System.err.println("Metadata:");
    for (String name : metadata.names())
      System.out.println(name + "\t:\t" + metadata.get(name));
  }
  ExcelData spreedsheet = new ExcelData(bcHandler.toString());
  return spreedsheet;
}
 
开发者ID:mark-watson,项目名称:power-java,代码行数:17,代码来源:PoiMicrosoftFileReader.java

示例4: getParser

import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
@Override
protected Parser getParser() {
   return new OOXMLParser();
}
 
开发者ID:Alfresco,项目名称:alfresco-repository,代码行数:5,代码来源:PoiOOXMLContentTransformer.java

示例5: getParser

import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
@Override
protected Parser getParser() 
{
    return new OOXMLParser();
}
 
开发者ID:Alfresco,项目名称:alfresco-repository,代码行数:6,代码来源:PoiMetadataExtracter.java

示例6: testExcelXLSB

import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
/**
 * We don't currently support the .xlsb file format 
 *  (an OOXML container with binary blobs), but we 
 *  shouldn't break on these files either (TIKA-826)  
 */
@Test
public void testExcelXLSB() throws Exception {
   Detector detector = new DefaultDetector();
   AutoDetectParser parser = new AutoDetectParser();
   
   InputStream input = ExcelParserTest.class.getResourceAsStream(
         "/test-documents/testEXCEL.xlsb");
   Metadata m = new Metadata();
   m.add(Metadata.RESOURCE_NAME_KEY, "excel.xlsb");
   
   // Should be detected correctly
   MediaType type = null;
   try {
      type = detector.detect(input, m);
      assertEquals("application/vnd.ms-excel.sheet.binary.macroenabled.12", type.toString());
   } finally {
      input.close();
   }
   
   // OfficeParser won't handle it
   assertEquals(false, (new OfficeParser()).getSupportedTypes(new ParseContext()).contains(type));
   
   // OOXMLParser won't handle it
   assertEquals(false, (new OOXMLParser()).getSupportedTypes(new ParseContext()).contains(type));
   
   // AutoDetectParser doesn't break on it
   input = ExcelParserTest.class.getResourceAsStream("/test-documents/testEXCEL.xlsb");

   try {
      ContentHandler handler = new BodyContentHandler(-1);
      ParseContext context = new ParseContext();
      context.set(Locale.class, Locale.US);
      parser.parse(input, handler, m, context);

      String content = handler.toString();
      assertEquals("", content);
   } finally {
      input.close();
   }
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:46,代码来源:ExcelParserTest.java

示例7: testExcel95

import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
/**
 * We don't currently support the old Excel 95 .xls file format, 
 *  but we shouldn't break on these files either (TIKA-976)  
 */
@Test
public void testExcel95() throws Exception {
   Detector detector = new DefaultDetector();
   AutoDetectParser parser = new AutoDetectParser();
   
   InputStream input = ExcelParserTest.class.getResourceAsStream(
         "/test-documents/testEXCEL_95.xls");
   Metadata m = new Metadata();
   m.add(Metadata.RESOURCE_NAME_KEY, "excel_95.xls");
   
   // Should be detected correctly
   MediaType type = null;
   try {
      type = detector.detect(input, m);
      assertEquals("application/vnd.ms-excel", type.toString());
   } finally {
      input.close();
   }
   
   // OfficeParser will claim to handle it
   assertEquals(true, (new OfficeParser()).getSupportedTypes(new ParseContext()).contains(type));
   
   // OOXMLParser won't handle it
   assertEquals(false, (new OOXMLParser()).getSupportedTypes(new ParseContext()).contains(type));
   
   // AutoDetectParser doesn't break on it
   input = ExcelParserTest.class.getResourceAsStream("/test-documents/testEXCEL_95.xls");

   try {
      ContentHandler handler = new BodyContentHandler(-1);
      ParseContext context = new ParseContext();
      context.set(Locale.class, Locale.US);
      parser.parse(input, handler, m, context);

      String content = handler.toString();
      assertEquals("", content);
   } finally {
      input.close();
   }
}
 
开发者ID:kanrourou,项目名称:software-testing,代码行数:45,代码来源:ExcelParserTest.java


注:本文中的org.apache.tika.parser.microsoft.ooxml.OOXMLParser类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。