Java Instance类代码示例

本文整理汇总了Java中cc.mallet.types.Instance类的典型用法代码示例。如果您正苦于以下问题：Java Instance类的具体用法？Java Instance怎么用？Java Instance使用的例子？那么, 这里精选的类代码示例或许可以为您提供帮助。

Instance类属于cc.mallet.types包，在下文中一共展示了Instance类的15个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: collectConstraints

import cc.mallet.types.Instance; //导入依赖的package包/类
public void collectConstraints (InstanceList ilist)
{
  for (int inum = 0; inum < ilist.size(); inum++) {
    logger.finest ("*** Collecting constraints for instance "+inum);
    Instance inst = ilist.get (inum);
    ACRF.UnrolledGraph unrolled = new ACRF.UnrolledGraph (inst, templates, null, true);
    Assignment assn = unrolled.getAssignment ();
    collectConstraintsForGraph (unrolled, assn);
  }
}

开发者ID:mimno，项目名称:GRMM，代码行数:11，代码来源:PseudolikelihoodACRFTrainer.java

示例2: getUnnormalizedClassificationScores

import cc.mallet.types.Instance; //导入依赖的package包/类
public void getUnnormalizedClassificationScores (Instance instance, double[] scores)
{
	//  arrayOutOfBounds if pipe has grown since training
	//        int numFeatures = getAlphabet().size() + 1;
	int numFeatures = this.defaultFeatureIndex + 1;

	int numLabels = getLabelAlphabet().size();
	assert (scores.length == numLabels);
	FeatureVector fv = (FeatureVector) instance.getData ();
	// Make sure the feature vector's feature dictionary matches
	// what we are expecting from our data pipe (and thus our notion
	// of feature probabilities.
	assert (fv.getAlphabet ()
			== this.instancePipe.getDataAlphabet ());

	// Include the feature weights according to each label
	for (int li = 0; li < numLabels; li++) {
		scores[li] = parameters[li*numFeatures + defaultFeatureIndex]
		                        + MatrixOps.rowDotProduct (parameters, numFeatures,
		                        		li, fv,
		                        		defaultFeatureIndex,
		                        		(perClassFeatureSelection == null
		                        				? featureSelection
		                        						: perClassFeatureSelection[li]));
	}
}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:27，代码来源:MaxEnt.java

示例3: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
public Instance pipe (Instance carrier)
{
	TokenSequence ts = (TokenSequence) carrier.getData();
	// xxx This doesn't seem so efficient.  Perhaps have TokenSequence
	// use a LinkedList, and remove Tokens from it? -?
	// But a LinkedList implementation of TokenSequence would be quite inefficient -AKM
	TokenSequence ret = new TokenSequence ();
	Token prevToken = null;
	for (int i = 0; i < ts.size(); i++) {
		Token t = ts.get(i);
		String s = t.getText();
		if (CharSequenceLexer.LEX_ALPHA.matcher(s).matches()) {
			ret.add (t);
			prevToken = t;
		}	else if (markDeletions && prevToken != null)
			prevToken.setProperty (FeatureSequenceWithBigrams.deletionMark, t.getText());
	}
	carrier.setData(ret);
	return carrier;
}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:21，代码来源:TokenSequenceRemoveNonAlpha.java

示例4: libSVMInstanceIndepFromMalletInstance

import cc.mallet.types.Instance; //导入依赖的package包/类
public static svm_node[] libSVMInstanceIndepFromMalletInstance(
        cc.mallet.types.Instance malletInstance) {

  // TODO: maybe check that data is really a sparse vector? Should be in all cases
  // except if we have an instance from MalletSeq
  SparseVector data = (SparseVector) malletInstance.getData();
  int[] indices = data.getIndices();
  double[] values = data.getValues();
  svm_node[] nodearray = new svm_node[indices.length];
  int index = 0;
  for (int j = 0; j < indices.length; j++) {
    svm_node node = new svm_node();
    node.index = indices[j]+1;   // NOTE: LibSVM locations have to start with 1
    node.value = values[j];
    nodearray[index] = node;
    index++;
  }
  return nodearray;
}

开发者ID:GateNLP，项目名称:gateplugin-LearningFramework，代码行数:20，代码来源:CorpusRepresentationLibSVM.java

示例5: classify

import cc.mallet.types.Instance; //导入依赖的package包/类
/**
 * Classifies an instance using Winnow's weights
 * @param instance an instance to be classified
 * @return an object containing the classifier's guess
    */
public Classification classify (Instance instance){
	int numClasses = getLabelAlphabet().size();
	double[] scores = new double[numClasses];
	FeatureVector fv = (FeatureVector) instance.getData ();
	// Make sure the feature vector's feature dictionary matches
	// what we are expecting from our data pipe (and thus our notion
	// of feature probabilities.
	assert (instancePipe == null || fv.getAlphabet () == this.instancePipe.getDataAlphabet ());
	int fvisize = fv.numLocations();
	
	// Set the scores by summing wi*xi
	for (int fvi = 0; fvi < fvisize; fvi++) {
		int fi = fv.indexAtLocation (fvi);
		for (int ci = 0; ci < numClasses; ci++)
	    scores[ci] += this.weights[ci][fi];
	}
	
	
	// Create and return a Classification object
	return new Classification (instance, this,
														 new LabelVector (getLabelAlphabet(),
																							scores));
}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:29，代码来源:Winnow.java

示例6: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
public Instance pipe(Instance carrier) {
	AgglomerativeNeighbor neighbor = (AgglomerativeNeighbor) carrier
			.getData();
	Clustering original = neighbor.getOriginal();
	int[] cluster1 = neighbor.getOldClusters()[0];
	int[] cluster2 = neighbor.getOldClusters()[1];
	InstanceList list = original.getInstances();
	int[] mergedIndices = neighbor.getNewCluster();
	Record[] records = array2Records(mergedIndices, list);
	Alphabet fieldAlph = records[0].fieldAlphabet();
	Alphabet valueAlph = records[0].valueAlphabet();

	PropertyList features = null;
	features = addExactMatch(records, fieldAlph, valueAlph, features);
	features = addApproxMatch(records, fieldAlph, valueAlph, features);
	features = addSubstringMatch(records, fieldAlph, valueAlph, features);
	carrier
			.setData(new FeatureVector(getDataAlphabet(), features,
					true));

	LabelAlphabet ldict = (LabelAlphabet) getTargetAlphabet();
	String label = (original.getLabel(cluster1[0]) == original
			.getLabel(cluster2[0])) ? "YES" : "NO";
	carrier.setTarget(ldict.lookupLabel(label));			
	return carrier;
}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:27，代码来源:Clusterings2Clusterer.java

示例7: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
@Override
public Instance pipe(Instance carrier) {

    TokenSequence ts = (TokenSequence) carrier.getData();
    for (int i = 0; i < ts.size(); i++) {
        Token t = ts.get(i);
        int splitLength = t.getText().split("\t").length;
        if (splitLength == this.minLineLength) {
            t.setText("O\t" + t.getText());
        } else {
            if (splitLength != (this.minLineLength + 1)) {
                System.err.println("input line does not have length " + this.minLineLength + " or "
                        + (this.minLineLength + 1) + " but " + splitLength + ": " + t.getText());
            }
        }
    }
    carrier.setData(ts);

    return carrier;
}

开发者ID:exciteproject，项目名称:refext，代码行数:21，代码来源:AddTargetToLinePipe.java

示例8: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
@Override
public Instance pipe(Instance carrier) {
    TokenSequence tokenSequence = (TokenSequence) carrier.getData();
    for (int i = 0; i < tokenSequence.size(); i++) {
        Token token = tokenSequence.get(i);
        String tokenText = token.getText().split(this.csvSeparator)[0];
        int count = 0;
        Matcher matcher = this.pattern.matcher(tokenText);
        while (matcher.find()) {
            count++;
        }
        // int count = StringUtils.countMatches(tokenText, this.subString);
        if (count > 0) {
            // token.setFeatureValue(this.feature + "=" + count, 1.0);
            token.setFeatureValue(this.feature, count);
        }
    }
    return carrier;
}

开发者ID:exciteproject，项目名称:refext，代码行数:20，代码来源:CountMatchesPipe.java

示例9: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
@Override
public Instance pipe(Instance carrier) {

    TokenSequence ts = (TokenSequence) carrier.getData();
    TokenSequence targetTokenSeq = new TokenSequence(ts.size());

    for (int i = 0; i < ts.size(); i++) {

        Token t = ts.get(i);
        // System.out.println(t.getText());
        String lineWithoutFirst = t.getText().replaceFirst("[^\\t]*\t", "");
        // System.out.println(lineWithoutFirst);
        // targetTokenSeq.add(lineSplit[0]);

        targetTokenSeq.add(t.getText().split("\t")[0]);
        t.setText(lineWithoutFirst);

    }
    carrier.setTarget(targetTokenSeq);
    carrier.setData(ts);

    return carrier;
}

开发者ID:exciteproject，项目名称:refext，代码行数:24，代码来源:LineToTargetTextPipe.java

示例10: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
@Override
public Instance pipe(Instance carrier) {
    this.referenceSectionFound = false;
    TokenSequence tokenSequence = (TokenSequence) carrier.getData();
    for (Token token : tokenSequence) {
        String tokenText = token.getText().split(this.csvSeparator)[0];

        if (tokenText.contains("Literaturverzeichnis") || tokenText.contains("Quellennachweise")
                || tokenText.contains("References") || tokenText.contains("REFERENCES")
                || tokenText.contains("Notes") || tokenText.contains("Literatur")
                || tokenText.contains("LITERATURVERZEICHNIS")) {
            this.referenceSectionFound = true;
        }
        if (this.referenceSectionFound) {
            token.setFeatureValue(this.feature, 1.0);
        }
    }
    return carrier;
}

开发者ID:exciteproject，项目名称:refext，代码行数:20，代码来源:ReferenceSectionPipe.java

示例11: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
@Override
public Instance pipe(Instance carrier) {
    TokenSequence tokenSequence = (TokenSequence) carrier.getData();
    int prevCount = 0;
    for (int i = 0; i < tokenSequence.size(); i++) {
        Token token = tokenSequence.get(i);
        String tokenText = token.getText().split(this.csvSeparator)[0];
        int count = 0;
        Matcher matcher = this.pattern.matcher(tokenText);
        while (matcher.find()) {
            count++;
        }
        // int count = StringUtils.countMatches(tokenText, this.subString);
        if (count < prevCount) {
            token.setFeatureValue(this.feature, 1.0);
        }
        prevCount = count;
    }
    return carrier;
}

开发者ID:exciteproject，项目名称:refext，代码行数:21，代码来源:ShorterLinePipe.java

示例12: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
@Override
public Instance pipe(Instance carrier) {

    TokenSequence targets = (TokenSequence) carrier.getTarget();

    for (int i = 0; i < targets.size(); i++) {

        Token target = targets.get(i);
        // System.out.println(t.getText());
        String targetLabel = target.getText();
        // System.out.println(lineWithoutFirst);
        // targetTokenSeq.add(lineSplit[0]);
        if (this.replacementMap.containsKey(targetLabel)) {
            target.setText(this.replacementMap.get(targetLabel));
        }
    }
    carrier.setTarget(targets);

    return carrier;
}

开发者ID:exciteproject，项目名称:refext，代码行数:21，代码来源:TargetReplacementPipe.java

示例13: pipe

import cc.mallet.types.Instance; //导入依赖的package包/类
public Instance pipe (Instance carrier)
{
	TokenSequence ts = (TokenSequence) carrier.getData();
	int tsSize = ts.size();
	for (int i = tsSize-1; i >= 0; i--) {
		Token t = ts.get (i);
		String text = t.getText();
		if (featureRegex != null && !featureRegex.matcher(text).matches())
			continue;
		for (int j = 0; j < i; j++) {
			if (ts.get(j).getText().equals(text)) {
				PropertyList.Iterator iter = ts.get(j).getFeatures().iterator();
				while (iter.hasNext()) {
					iter.next();
					String key = iter.getKey();
					if (filterRegex == null || (filterRegex.matcher(key).matches() ^ !includeFiltered))
						t.setFeatureValue (namePrefix+key, iter.getNumericValue());
				}
				break;
			}
			if (firstMentionName != null)
				t.setFeatureValue (firstMentionName, 1.0);
		}
	}
	return carrier;
}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:27，代码来源:FeaturesOfFirstMention.java

示例14: incorporateOneInstance

import cc.mallet.types.Instance; //导入依赖的package包/类
private void incorporateOneInstance (Instance instance, double instanceWeight) 
{
  Labeling labeling = instance.getLabeling ();
  if (labeling == null) return; // Handle unlabeled instances by skipping them
  FeatureVector fv = (FeatureVector) instance.getData ();
  double oneNorm = fv.oneNorm();
  if (oneNorm <= 0) return; // Skip instances that have no features present
  if (docLengthNormalization > 0)
  	// Make the document have counts that sum to docLengthNormalization
  	// I.e., if 20, it would be as if the document had 20 words.
  	instanceWeight *= docLengthNormalization / oneNorm;
  assert (instanceWeight > 0 && !Double.isInfinite(instanceWeight));
  for (int lpos = 0; lpos < labeling.numLocations(); lpos++) {
    int li = labeling.indexAtLocation (lpos);
    double labelWeight = labeling.valueAtLocation (lpos);
    if (labelWeight == 0) continue;
    //System.out.println ("NaiveBayesTrainer me.increment "+ labelWeight * instanceWeight);
    me[li].increment (fv, labelWeight * instanceWeight);
    // This relies on labelWeight summing to 1 over all labels
    pe.increment (li, labelWeight * instanceWeight);
  }
}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:23，代码来源:NaiveBayesTrainer.java

示例15: main

import cc.mallet.types.Instance; //导入依赖的package包/类
public static void main (String[] args)
	{
		try {
			Pipe p = new SerialPipes (new Pipe[] {
				new Input2CharSequence (),
				new SGML2TokenSequence()
//				new SGML2TokenSequence (new CharSequenceLexer (Pattern.compile (".")), "O")
				});

			for (int i = 0; i < args.length; i++) {
				Instance carrier = p.instanceFrom(new Instance (new File(args[i]), null, null, null));
				TokenSequence data = (TokenSequence) carrier.getData();
				TokenSequence target = (TokenSequence) carrier.getTarget();
				logger.finer ("===");
				logger.info (args[i]);
				for (int j = 0; j < data.size(); j++)
					logger.info (target.get(j).getText()+" "+data.get(j).getText());
			}
		} catch (Exception e) {
			System.out.println (e);
			e.printStackTrace();
		}
	}

开发者ID:kostagiolasn，项目名称:NucleosomePatternClassifier，代码行数:24，代码来源:SGML2TokenSequence.java

注：本文中的cc.mallet.types.Instance类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。