當前位置: 首頁>>代碼示例>>Java>>正文


Java CAS.setDocumentLanguage方法代碼示例

本文整理匯總了Java中org.apache.uima.cas.CAS.setDocumentLanguage方法的典型用法代碼示例。如果您正苦於以下問題:Java CAS.setDocumentLanguage方法的具體用法?Java CAS.setDocumentLanguage怎麽用?Java CAS.setDocumentLanguage使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在org.apache.uima.cas.CAS的用法示例。


在下文中一共展示了CAS.setDocumentLanguage方法的1個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。

示例1: main

import org.apache.uima.cas.CAS; //導入方法依賴的package包/類
public static void main(String[] args) throws IOException, InvalidXMLException, ResourceInitializationException,
		AnalysisEngineProcessException, CASException {
	if (args.length != 2) {
		System.err.println("Usage: OpenNlpTrainerExtractor <input folder> <output file>");
	}

	AnalysisEngineDescription descriptor = (AnalysisEngineDescription) createResourceCreationSpecifier(
			new XMLInputSource(OpenNlpTrainerExtractor.class.getClassLoader().getResourceAsStream(
					"org/ie4opendata/octroy/SimpleFrenchTokenAndSentenceAnnotator.xml"), new File(".")),
			new Object[0]);
	AnalysisEngine engine = AnalysisEngineFactory.createEngine(descriptor);
	CAS cas = engine.newCAS();

	PrintWriter pw = new PrintWriter(new FileWriter(args[1]));

	for (File file : new File(args[0]).listFiles()) {
		BufferedReader br = new BufferedReader(new FileReader(file));

		StringBuilder doc = new StringBuilder();
		String line = br.readLine();
		while (line != null) {
			doc.append(line).append('\n');
			line = br.readLine();
		}
		br.close();

		cas.reset();
		cas.setDocumentText(doc.toString());
		cas.setDocumentLanguage("fr");

		DocumentAnnotation documentAnnotation = new DocumentAnnotation(cas.getJCas());
		documentAnnotation.setDocumentName(file.getName());
		documentAnnotation.setClassified(false);
		documentAnnotation.addToIndexes();

		engine.process(cas);

		// one sentence per line, one token separated by spaces
		JCas jcas = cas.getJCas();
		for (Sentence sentence : JCasUtil.select(jcas, Sentence.class)) {
			for (Token token : JCasUtil.selectCovered(Token.class, sentence)) {
				pw.print(token.getCoveredText() + " ");
			}
			pw.println();
		}
		// each document separated by an empty line
		pw.println();
	}
	pw.close();
}
 
開發者ID:IE4OpenData,項目名稱:Octroy,代碼行數:51,代碼來源:OpenNlpTrainerExtractor.java


注:本文中的org.apache.uima.cas.CAS.setDocumentLanguage方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。