Java Cluster.INITIAL_CLUSTERS_DIR属性代码示例

本文整理汇总了Java中org.apache.mahout.clustering.Cluster.INITIAL_CLUSTERS_DIR属性的典型用法代码示例。如果您正苦于以下问题：Java Cluster.INITIAL_CLUSTERS_DIR属性的具体用法？Java Cluster.INITIAL_CLUSTERS_DIR怎么用？Java Cluster.INITIAL_CLUSTERS_DIR使用的例子？那么恭喜您, 这里精选的属性代码示例或许可以为您提供帮助。您也可以进一步了解该属性所在类org.apache.mahout.clustering.Cluster的用法示例。

在下文中一共展示了Cluster.INITIAL_CLUSTERS_DIR属性的4个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: buildClusters

/**
 * Iterate over the input vectors to produce cluster directories for each iteration
 * 
 * @param conf
 *          the hadoop configuration
 * @param input
 *          the directory Path for input points
 * @param output
 *          the directory Path for output points
 * @param description
 *          model distribution parameters
 * @param numClusters
 *          the number of models to iterate over
 * @param maxIterations
 *          the maximum number of iterations
 * @param alpha0
 *          the alpha_0 value for the DirichletDistribution
 * @param runSequential
 *          execute sequentially if true
 * 
 * @return the Path of the final clusters directory
 */
public static Path buildClusters(Configuration conf, Path input, Path output, DistributionDescription description,
    int numClusters, int maxIterations, double alpha0, boolean runSequential) throws IOException,
    ClassNotFoundException, InterruptedException {
  Path clustersIn = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
  ModelDistribution<VectorWritable> modelDist = description.createModelDistribution(conf);
  
  List<Cluster> models = Lists.newArrayList();
  for (Model<VectorWritable> cluster : modelDist.sampleFromPrior(numClusters)) {
    models.add((Cluster) cluster);
  }
  
  ClusterClassifier prior = new ClusterClassifier(models, new DirichletClusteringPolicy(numClusters, alpha0));
  prior.writeToSeqFiles(clustersIn);
  
  if (runSequential) {
    ClusterIterator.iterateSeq(conf, input, clustersIn, output, maxIterations);
  } else {
    ClusterIterator.iterateMR(conf, input, clustersIn, output, maxIterations);
  }
  return output;
  
}

开发者ID:saradelrio，项目名称:Chi-FRBCS-BigDataCS，代码行数:44，代码来源:DirichletDriver.java

示例2: buildClusters

/**
 * Iterate over the input vectors to produce cluster directories for each iteration
 * 
 * @param conf
 *          the Configuration to use
 * @param input
 *          the directory pathname for input points
 * @param clustersIn
 *          the directory pathname for initial & computed clusters
 * @param output
 *          the directory pathname for output points
 * @param measure
 *          the classname of the DistanceMeasure
 * @param maxIterations
 *          the maximum number of iterations
 * @param delta
 *          the convergence delta value
 * @param runSequential
 *          if true execute sequential algorithm
 * 
 * @return the Path of the final clusters directory
 */
public static Path buildClusters(Configuration conf, Path input, Path clustersIn, Path output,
    DistanceMeasure measure, int maxIterations, String delta, boolean runSequential) throws IOException,
    InterruptedException, ClassNotFoundException {
  
  double convergenceDelta = Double.parseDouble(delta);
  List<Cluster> clusters = new ArrayList<Cluster>();
  KMeansUtil.configureWithClusterInfo(conf, clustersIn, clusters);
  
  if (clusters.isEmpty()) {
    throw new IllegalStateException("No input clusters found in " + clustersIn + ". Check your -c argument.");
  }
  
  Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
  ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
  ClusterClassifier prior = new ClusterClassifier(clusters, policy);
  prior.writeToSeqFiles(priorClustersPath);
  
  if (runSequential) {
    ClusterIterator.iterateSeq(conf, input, priorClustersPath, output, maxIterations);
  } else {
    ClusterIterator.iterateMR(conf, input, priorClustersPath, output, maxIterations);
  }
  return output;
}

开发者ID:saradelrio，项目名称:Chi-FRBCS-BigDataCS，代码行数:46，代码来源:KMeansDriver.java

示例3: buildClusters

/**
 * Iterate over the input vectors to produce cluster directories for each iteration
 * @param input
 *          the directory pathname for input points
 * @param clustersIn
 *          the file pathname for initial cluster centers
 * @param output
 *          the directory pathname for output points
 * @param measure
 *          the classname of the DistanceMeasure
 * @param convergenceDelta
 *          the convergence delta value
 * @param maxIterations
 *          the maximum number of iterations
 * @param m
 *          the fuzzification factor, see
 *          http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
 * @param runSequential if true run in sequential execution mode
 * 
 * @return the Path of the final clusters directory
 */
public static Path buildClusters(Configuration conf,
                                 Path input,
                                 Path clustersIn,
                                 Path output,
                                 DistanceMeasure measure,
                                 double convergenceDelta,
                                 int maxIterations,
                                 float m,
                                 boolean runSequential)
  throws IOException, InterruptedException, ClassNotFoundException {
  
  List<Cluster> clusters = new ArrayList<Cluster>();
  FuzzyKMeansUtil.configureWithClusterInfo(conf, clustersIn, clusters);
  
  if (conf==null) {
    conf = new Configuration();
  }
  
  if (clusters.isEmpty()) {
    throw new IllegalStateException("No input clusters found in " + clustersIn + ". Check your -c argument.");
  }
  
  Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);   
  ClusteringPolicy policy = new FuzzyKMeansClusteringPolicy(m, convergenceDelta);
  ClusterClassifier prior = new ClusterClassifier(clusters, policy);
  prior.writeToSeqFiles(priorClustersPath);
  
  if (runSequential) {
    ClusterIterator.iterateSeq(conf, input, priorClustersPath, output, maxIterations);
  } else {
    ClusterIterator.iterateMR(conf, input, priorClustersPath, output, maxIterations);
  }
  return output;
}

开发者ID:saradelrio，项目名称:Chi-FRBCS-BigDataCS，代码行数:55，代码来源:FuzzyKMeansDriver.java

示例4: run

/**
 * Run the job where the input format can be either Vectors or Canopies. If
 * requested, cluster the input data using the computed Canopies
 * 
 * @param conf
 *          the Configuration to use
 * @param input
 *          the input pathname String
 * @param output
 *          the output pathname String
 * @param measure
 *          the DistanceMeasure
 * @param kernelProfile
 *          the IKernelProfile
 * @param t1
 *          the T1 distance threshold
 * @param t2
 *          the T2 distance threshold
 * @param convergenceDelta
 *          the double convergence criteria
 * @param maxIterations
 *          an int number of iterations
 * @param inputIsCanopies
 *          true if the input path already contains MeanShiftCanopies and does
 *          not need to be converted from Vectors
 * @param runClustering
 *          true if the input points are to be clustered once the iterations
 *          complete
 * @param runSequential
 *          if true run in sequential execution mode
 */
public static void run(Configuration conf, Path input, Path output,
    DistanceMeasure measure, IKernelProfile kernelProfile, double t1,
    double t2, double convergenceDelta, int maxIterations,
    boolean inputIsCanopies, boolean runClustering, boolean runSequential)
    throws IOException, InterruptedException, ClassNotFoundException {
  Path clustersIn = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
  if (inputIsCanopies) {
    clustersIn = input;
  } else {
    createCanopyFromVectors(conf, input, clustersIn, measure, runSequential);
  }

  Path clustersOut = buildClusters(conf, clustersIn, output, measure,
      kernelProfile, t1, t2, convergenceDelta, maxIterations, runSequential,
      runClustering);
  if (runClustering) {
    clusterData(inputIsCanopies ? input : new Path(output,
        Cluster.INITIAL_CLUSTERS_DIR), clustersOut, new Path(output,
        Cluster.CLUSTERED_POINTS_DIR), runSequential);
  }
}

开发者ID:saradelrio，项目名称:Chi-FRBCS-BigDataCS，代码行数:52，代码来源:MeanShiftCanopyDriver.java

注：本文中的org.apache.mahout.clustering.Cluster.INITIAL_CLUSTERS_DIR属性示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。