當前位置: 首頁>>代碼示例>>Java>>正文


Java Punctuation類代碼示例

本文整理匯總了Java中uk.ac.man.cs.choif.nlp.surface.Punctuation的典型用法代碼示例。如果您正苦於以下問題:Java Punctuation類的具體用法?Java Punctuation怎麽用?Java Punctuation使用的例子?那麽, 這裏精選的類代碼示例或許可以為您提供幫助。


Punctuation類屬於uk.ac.man.cs.choif.nlp.surface包,在下文中一共展示了Punctuation類的3個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。

示例1: normalize

import uk.ac.man.cs.choif.nlp.surface.Punctuation; //導入依賴的package包/類
/**
 * Given a document as a list of tokenised sentences, 
 * this function produces a list of stem frequency tables,
 * or context vector
 * Creation date: (11/05/99 03:43:34)
 * @return uk.ac.man.cs.choif.extend.structure.ContextVector[]
 * @param S java.lang.String[][]
 */


// modification par Christine Jacquin le 28/09/10 
// avant: méthode private final static, maintenant=> protected
protected  ContextVector[] normalize(final String[][] S) {
	
	//System.out.println("on passe pas dans la bonne normalise");

	WordList stopword = WordList.stopwordList();
	ContextVector[] V = new ContextVector[S.length];
	String token, stem;
	for (int i=S.length; i-->0;) {
		V[i] = new ContextVector();
		for (int j=S[i].length; j-->0;) {
			token = S[i][j].toLowerCase();
			if (Punctuation.isWord(token) && !stopword.has(token)) {
				stem = Stemmer.stemOf(token);
				ContextVector.inc(stem, 1, V[i]);
			}
		}
	}
		
	return V;
}
 
開發者ID:DrDub,項目名稱:uima-text-segmenter,代碼行數:33,代碼來源:C99LINA.java

示例2: normalize

import uk.ac.man.cs.choif.nlp.surface.Punctuation; //導入依賴的package包/類
/** Redefine the method normalize of the super class C99
 * we write the same code excepted that we use the result of the WST and Snowball component
 * to obtain the tokens and their associated stem (stored in the tabTokenStem object)
  * The S parameter is not be used but is coming from the normalize method which is inherited
* The tabTokenStem array replaces S in the UIMA implementation
 */
public  ContextVector[] normalize(final String[][] S) {
	WordList stopword = WordList.stopwordList();
	ContextVector[] v = new ContextVector[rawText.getSentenceArrayOfTokenFeatureArray().length];
	String token, stem;
	for (int i=rawText.getSentenceArrayOfTokenFeatureArray().length; i-->0;) {
		v[i] = new ContextVector();
		for (int j=rawText.getSentenceArrayOfTokenFeatureArray()[i].length; j-->0;) {
			token = rawText.getSentenceArrayOfTokenFeatureArray()[i][j].getToken().toLowerCase();
			// to take into account the behavior of isWord() method
			// for this method,if a "-" is involved in the token, this one is a word
			// so the "-" is a word to for this method
			if (!token.equals("-")){
				if (Punctuation.isWord(token) && !stopword.has(token)) {
					stem = rawText.getSentenceArrayOfTokenFeatureArray()[i][j].getTokenFeature().toLowerCase();
					ContextVector.inc(stem, 1, v[i]);
				}
			}
		}
	}

	return v;

}
 
開發者ID:DrDub,項目名稱:uima-text-segmenter,代碼行數:30,代碼來源:C99Parser.java

示例3: normalize

import uk.ac.man.cs.choif.nlp.surface.Punctuation; //導入依賴的package包/類
/**
 * Given a document as a list of tokenised sentences, 
 * this function produces a list of stem frequency tables,
 * or context vector
 * Creation date: (11/05/99 03:43:34)
 * @return uk.ac.man.cs.choif.extend.structure.ContextVector[]
 * @param S java.lang.String[][]
 */
private final static ContextVector[] normalize(final String[][] S) {
	WordList stopword = WordList.stopwordList();
	ContextVector[] V = new ContextVector[S.length];

	String token, stem;
	for (int i=S.length; i-->0;) {
		V[i] = new ContextVector();
		for (int j=S[i].length; j-->0;) {
			token = S[i][j].toLowerCase();
			if (Punctuation.isWord(token) && !stopword.has(token)) {
				stem = Stemmer.stemOf(token);
				ContextVector.inc(stem, 1, V[i]);
			}
		}
	}
		
	return V;
}
 
開發者ID:DrDub,項目名稱:uima-text-segmenter,代碼行數:27,代碼來源:C99.java


注:本文中的uk.ac.man.cs.choif.nlp.surface.Punctuation類示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。