Java DataSet.mapPartition方法代码示例

本文整理汇总了Java中org.apache.flink.api.java.DataSet.mapPartition方法的典型用法代码示例。如果您正苦于以下问题：Java DataSet.mapPartition方法的具体用法？Java DataSet.mapPartition怎么用？Java DataSet.mapPartition使用的例子？那么, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类org.apache.flink.api.java.DataSet的用法示例。

在下文中一共展示了DataSet.mapPartition方法的4个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: countElementsPerPartition

import org.apache.flink.api.java.DataSet; //导入方法依赖的package包/类
/**
 * Method that goes over all the elements in each partition in order to retrieve
 * the total number of elements.
 *
 * @param input the DataSet received as input
 * @return a data set containing tuples of subtask index, number of elements mappings.
 */
public static <T> DataSet<Tuple2<Integer, Long>> countElementsPerPartition(DataSet<T> input) {
	return input.mapPartition(new RichMapPartitionFunction<T, Tuple2<Integer, Long>>() {
		@Override
		public void mapPartition(Iterable<T> values, Collector<Tuple2<Integer, Long>> out) throws Exception {
			long counter = 0;
			for (T value : values) {
				counter++;
			}
			out.collect(new Tuple2<>(getRuntimeContext().getIndexOfThisSubtask(), counter));
		}
	});
}

开发者ID:axbaretto，项目名称:flink，代码行数:20，代码来源:DataSetUtils.java

示例2: zipWithUniqueId

import org.apache.flink.api.java.DataSet; //导入方法依赖的package包/类
/**
 * Method that assigns a unique {@link Long} value to all elements in the input data set as described below.
 * <ul>
 *  <li> a map function is applied to the input data set
 *  <li> each map task holds a counter c which is increased for each record
 *  <li> c is shifted by n bits where n = log2(number of parallel tasks)
 * 	<li> to create a unique ID among all tasks, the task id is added to the counter
 * 	<li> for each record, the resulting counter is collected
 * </ul>
 *
 * @param input the input data set
 * @return a data set of tuple 2 consisting of ids and initial values.
 */
public static <T> DataSet<Tuple2<Long, T>> zipWithUniqueId (DataSet <T> input) {

	return input.mapPartition(new RichMapPartitionFunction<T, Tuple2<Long, T>>() {

		long maxBitSize = getBitSize(Long.MAX_VALUE);
		long shifter = 0;
		long start = 0;
		long taskId = 0;
		long label = 0;

		@Override
		public void open(Configuration parameters) throws Exception {
			super.open(parameters);
			shifter = getBitSize(getRuntimeContext().getNumberOfParallelSubtasks() - 1);
			taskId = getRuntimeContext().getIndexOfThisSubtask();
		}

		@Override
		public void mapPartition(Iterable<T> values, Collector<Tuple2<Long, T>> out) throws Exception {
			for (T value : values) {
				label = (start << shifter) + taskId;

				if (getBitSize(start) + shifter < maxBitSize) {
					out.collect(new Tuple2<>(label, value));
					start++;
				} else {
					throw new Exception("Exceeded Long value range while generating labels");
				}
			}
		}
	});
}

开发者ID:axbaretto，项目名称:flink，代码行数:46，代码来源:DataSetUtils.java

示例3: sampleWithSize

import org.apache.flink.api.java.DataSet; //导入方法依赖的package包/类
/**
 * Generate a sample of DataSet which contains fixed size elements.
 *
 * <p><strong>NOTE:</strong> Sample with fixed size is not as efficient as sample with fraction, use sample with
 * fraction unless you need exact precision.
 *
 * @param withReplacement Whether element can be selected more than once.
 * @param numSamples       The expected sample size.
 * @param seed            Random number generator seed.
 * @return The sampled DataSet
 */
public static <T> DataSet<T> sampleWithSize(
	DataSet <T> input,
	final boolean withReplacement,
	final int numSamples,
	final long seed) {

	SampleInPartition<T> sampleInPartition = new SampleInPartition<>(withReplacement, numSamples, seed);
	MapPartitionOperator mapPartitionOperator = input.mapPartition(sampleInPartition);

	// There is no previous group, so the parallelism of GroupReduceOperator is always 1.
	String callLocation = Utils.getCallLocationName();
	SampleInCoordinator<T> sampleInCoordinator = new SampleInCoordinator<>(withReplacement, numSamples, seed);
	return new GroupReduceOperator<>(mapPartitionOperator, input.getType(), sampleInCoordinator, callLocation);
}

开发者ID:axbaretto，项目名称:flink，代码行数:26，代码来源:DataSetUtils.java

示例4: sample

import org.apache.flink.api.java.DataSet; //导入方法依赖的package包/类
/**
 * Generate a sample of DataSet by the probability fraction of each element.
 *
 * @param withReplacement Whether element can be selected more than once.
 * @param fraction        Probability that each element is chosen, should be [0,1] without replacement,
 *                        and [0, ∞) with replacement. While fraction is larger than 1, the elements are
 *                        expected to be selected multi times into sample on average.
 * @param seed            random number generator seed.
 * @return The sampled DataSet
 */
public static <T> MapPartitionOperator<T, T> sample(
	DataSet <T> input,
	final boolean withReplacement,
	final double fraction,
	final long seed) {

	return input.mapPartition(new SampleWithFraction<T>(withReplacement, fraction, seed));
}

开发者ID:axbaretto，项目名称:flink，代码行数:19，代码来源:DataSetUtils.java

注：本文中的org.apache.flink.api.java.DataSet.mapPartition方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。