当前位置: 首页>>代码示例>>Java>>正文


Java Tika.parseToString方法代码示例

本文整理汇总了Java中org.apache.tika.Tika.parseToString方法的典型用法代码示例。如果您正苦于以下问题:Java Tika.parseToString方法的具体用法?Java Tika.parseToString怎么用?Java Tika.parseToString使用的例子?那么, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在org.apache.tika.Tika的用法示例。


在下文中一共展示了Tika.parseToString方法的4个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: open

import org.apache.tika.Tika; //导入方法依赖的package包/类
public void open(File file) {

		Tika tika = new Tika();
		tika.setMaxStringLength(999999);
		StringWriter sw = new StringWriter();
		PrintWriter pw = new PrintWriter(sw);
		
		fileName.setText(file.getName());
		int dot = file.getName().lastIndexOf('.');
		String saveName = file.getName().substring(0, dot) + ".txt";
		try {
			//long start = System.currentTimeMillis();
			String text = tika.parseToString(file);
			//long end = System.currentTimeMillis();
			pw.println(text);
			//mimeType.setText(tika.detect(file) + " (" + NumberFormat.getNumberInstance().format(end-start) + "ms)");
		} catch(Exception ex){
			ex.printStackTrace(pw);
		}
		
		pw.flush();

		plain.setText(sw.toString());
		saveFile(sw.toString(), saveName);
		plain.setCaretPosition(0);
		return ;
	}
 
开发者ID:GeneZC,项目名称:Apache-tika-gui,代码行数:28,代码来源:TikaGUI.java

示例2: parse

import org.apache.tika.Tika; //导入方法依赖的package包/类
@Override
public AldermanAttendance parse(Path targetFile) {
	try {
		Tika tika = new Tika();
		String content = tika.parseToString(targetFile.toFile());
		AldermanAttendance aldermanAttendance = ParserUtils.fromText(content);
		return aldermanAttendance;
	} catch (IOException | TikaException e) {
		e.printStackTrace();
		return null;
	}
}
 
开发者ID:sjcdigital,项目名称:presenca-vereadores-sjc,代码行数:13,代码来源:AnyDocumentAttendanceParser.java

示例3: DocxToText

import org.apache.tika.Tika; //导入方法依赖的package包/类
public static String DocxToText(String docxFilePath)
    throws IOException, InvalidFormatException, XmlException, TikaException {
  String ret = "";
  FileInputStream fis = new FileInputStream(docxFilePath);
  Tika tika = new Tika();
  ret = tika.parseToString(fis);
  fis.close();
  return ret;
}
 
开发者ID:mark-watson,项目名称:power-java,代码行数:10,代码来源:PoiMicrosoftFileReader.java

示例4: processDocument

import org.apache.tika.Tika; //导入方法依赖的package包/类
@Override
public Document[] processDocument(Document document) {
  byte[] rawData = document.getRawData();
  if (rawData == null) {
    log.debug("Skipping document without data in " + getName());
    return new Document[]{document};
  }
  try {
    Tika tika = new Tika();
    tika.setMaxStringLength(document.getRawData().length);
    Metadata metadata = new Metadata();
    try (ByteArrayInputStream bais = new ByteArrayInputStream(rawData)) {
      String textContent = tika.parseToString(bais, metadata);
      document.setRawData(textContent.getBytes(Charset.forName("UTF-8")));
      for (String name : metadata.names()) {
        document.put(sanitize(name) + plusSuffix(), metadata.get(name));
      }
    } catch (IOException | TikaException e) {
      log.warn("Tika processing failure!", e);
      // if tika can't parse it we certainly don't want random binary crap in the index
      document.setStatus(Status.DROPPED);
    }
  } catch (Throwable t) {
    boolean isAccessControl = t instanceof AccessControlException;
    boolean isSecurity = t instanceof SecurityException;
    if (!isAccessControl && !isSecurity) {
      throw t;
    } else {
      System.out.println("gotcha!");
    }
  }
  return new Document[]{document};
}
 
开发者ID:nsoft,项目名称:jesterj,代码行数:34,代码来源:TikaProcessor.java


注:本文中的org.apache.tika.Tika.parseToString方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。