本文整理匯總了Java中org.apache.mahout.clustering.iterator.DirichletClusteringPolicy類的典型用法代碼示例。如果您正苦於以下問題:Java DirichletClusteringPolicy類的具體用法?Java DirichletClusteringPolicy怎麽用?Java DirichletClusteringPolicy使用的例子?那麽, 這裏精選的類代碼示例或許可以為您提供幫助。
DirichletClusteringPolicy類屬於org.apache.mahout.clustering.iterator包,在下文中一共展示了DirichletClusteringPolicy類的2個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。
示例1: buildClusters
import org.apache.mahout.clustering.iterator.DirichletClusteringPolicy; //導入依賴的package包/類
/**
* Iterate over the input vectors to produce cluster directories for each iteration
*
* @param conf
* the hadoop configuration
* @param input
* the directory Path for input points
* @param output
* the directory Path for output points
* @param description
* model distribution parameters
* @param numClusters
* the number of models to iterate over
* @param maxIterations
* the maximum number of iterations
* @param alpha0
* the alpha_0 value for the DirichletDistribution
* @param runSequential
* execute sequentially if true
*
* @return the Path of the final clusters directory
*/
public static Path buildClusters(Configuration conf, Path input, Path output, DistributionDescription description,
int numClusters, int maxIterations, double alpha0, boolean runSequential) throws IOException,
ClassNotFoundException, InterruptedException {
Path clustersIn = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
ModelDistribution<VectorWritable> modelDist = description.createModelDistribution(conf);
List<Cluster> models = Lists.newArrayList();
for (Model<VectorWritable> cluster : modelDist.sampleFromPrior(numClusters)) {
models.add((Cluster) cluster);
}
ClusterClassifier prior = new ClusterClassifier(models, new DirichletClusteringPolicy(numClusters, alpha0));
prior.writeToSeqFiles(clustersIn);
if (runSequential) {
ClusterIterator.iterateSeq(conf, input, clustersIn, output, maxIterations);
} else {
ClusterIterator.iterateMR(conf, input, clustersIn, output, maxIterations);
}
return output;
}
示例2: clusterData
import org.apache.mahout.clustering.iterator.DirichletClusteringPolicy; //導入依賴的package包/類
/**
* Run the job using supplied arguments
*
* @param conf
* @param input
* the directory pathname for input points
* @param stateIn
* the directory pathname for input state
* @param output
* the directory pathname for output points
* @param emitMostLikely
* a boolean if true emit only most likely cluster for each point
* @param threshold
* a double threshold value emits all clusters having greater pdf (emitMostLikely = false)
* @param runSequential
* execute sequentially if true
*/
public static void clusterData(Configuration conf, Path input, Path stateIn, Path output, double alpha0,
int numModels, boolean emitMostLikely, double threshold, boolean runSequential) throws IOException,
InterruptedException, ClassNotFoundException {
ClusterClassifier.writePolicy(new DirichletClusteringPolicy(numModels, alpha0), stateIn);
ClusterClassificationDriver.run(conf, input, output, new Path(output, PathDirectory.CLUSTERED_POINTS_DIRECTORY), threshold,
emitMostLikely, runSequential);
}