当前位置: 首页>>代码示例>>Java>>正文


Java PCollectionTuple类代码示例

本文整理汇总了Java中com.google.cloud.dataflow.sdk.values.PCollectionTuple的典型用法代码示例。如果您正苦于以下问题:Java PCollectionTuple类的具体用法?Java PCollectionTuple怎么用?Java PCollectionTuple使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。


PCollectionTuple类属于com.google.cloud.dataflow.sdk.values包,在下文中一共展示了PCollectionTuple类的3个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: apply

import com.google.cloud.dataflow.sdk.values.PCollectionTuple; //导入依赖的package包/类
@Override
public PCollectionTuple apply(PCollection<String> lines) {
  // Convert lines of text into individual words.
  PCollectionTuple lowerUpper = lines
      .apply(ParDo.of(extractWordsFn)
          .withSideInputs(regex)
          .withOutputTags(lower, TupleTagList.of(upper)));
  lowerUpper.get(lower).setCoder(StringUtf8Coder.of());
  lowerUpper.get(upper).setCoder(StringUtf8Coder.of());
  PCollection<KV<String, Long>> lowerCounts = lowerUpper.get(lower).apply(Count
      .<String>perElement());
  PCollection<KV<String, Long>> upperCounts = lowerUpper.get(upper).apply(Count
      .<String>perElement());
  return PCollectionTuple
      .of(lowerCnts, lowerCounts)
      .and(upperCnts, upperCounts);
}
 
开发者ID:shakamunyi,项目名称:spark-dataflow,代码行数:18,代码来源:MultiOutputWordCountTest.java

示例2: multiDo

import com.google.cloud.dataflow.sdk.values.PCollectionTuple; //导入依赖的package包/类
private static <I, O> TransformEvaluator<ParDo.BoundMulti<I, O>> multiDo() {
  return new TransformEvaluator<ParDo.BoundMulti<I, O>>() {
    @Override
    public void evaluate(ParDo.BoundMulti<I, O> transform, EvaluationContext context) {
      TupleTag<O> mainOutputTag = MULTIDO_FG.get("mainOutputTag", transform);
      MultiDoFnFunction<I, O> multifn = new MultiDoFnFunction<>(
          transform.getFn(),
          context.getRuntimeContext(),
          mainOutputTag,
          getSideInputs(transform.getSideInputs(), context));

      @SuppressWarnings("unchecked")
      JavaRDDLike<WindowedValue<I>, ?> inRDD =
          (JavaRDDLike<WindowedValue<I>, ?>) context.getInputRDD(transform);
      JavaPairRDD<TupleTag<?>, WindowedValue<?>> all = inRDD
          .mapPartitionsToPair(multifn)
          .cache();

      PCollectionTuple pct = context.getOutput(transform);
      for (Map.Entry<TupleTag<?>, PCollection<?>> e : pct.getAll().entrySet()) {
        @SuppressWarnings("unchecked")
        JavaPairRDD<TupleTag<?>, WindowedValue<?>> filtered =
            all.filter(new TupleTagFilter(e.getKey()));
        @SuppressWarnings("unchecked")
        // Object is the best we can do since different outputs can have different tags
        JavaRDD<WindowedValue<Object>> values =
            (JavaRDD<WindowedValue<Object>>) (JavaRDD<?>) filtered.values();
        context.setRDD(e.getValue(), values);
      }
    }
  };
}
 
开发者ID:shakamunyi,项目名称:spark-dataflow,代码行数:33,代码来源:TransformTranslator.java

示例3: testRun

import com.google.cloud.dataflow.sdk.values.PCollectionTuple; //导入依赖的package包/类
@Test
public void testRun() throws Exception {
  Pipeline p = Pipeline.create(PipelineOptionsFactory.create());
  PCollection<String> regex = p.apply(Create.of("[^a-zA-Z']+"));
  PCollection<String> w1 = p.apply(Create.of("Here are some words to count", "and some others"));
  PCollection<String> w2 = p.apply(Create.of("Here are some more words", "and even more words"));
  PCollectionList<String> list = PCollectionList.of(w1).and(w2);

  PCollection<String> union = list.apply(Flatten.<String>pCollections());
  PCollectionView<String> regexView = regex.apply(View.<String>asSingleton());
  CountWords countWords = new CountWords(regexView);
  PCollectionTuple luc = union.apply(countWords);
  PCollection<Long> unique = luc.get(lowerCnts).apply(
      ApproximateUnique.<KV<String, Long>>globally(16));

  EvaluationResult res = SparkPipelineRunner.create().run(p);
  Iterable<KV<String, Long>> actualLower = res.get(luc.get(lowerCnts));
  Assert.assertEquals("are", actualLower.iterator().next().getKey());
  Iterable<KV<String, Long>> actualUpper = res.get(luc.get(upperCnts));
  Assert.assertEquals("Here", actualUpper.iterator().next().getKey());
  Iterable<Long> actualUniqCount = res.get(unique);
  Assert.assertEquals(9, (long) actualUniqCount.iterator().next());
  int actualTotalWords = res.getAggregatorValue("totalWords", Integer.class);
  Assert.assertEquals(18, actualTotalWords);
  int actualMaxWordLength = res.getAggregatorValue("maxWordLength", Integer.class);
  Assert.assertEquals(6, actualMaxWordLength);
  AggregatorValues<Integer> aggregatorValues = res.getAggregatorValues(countWords
      .getTotalWordsAggregator());
  Assert.assertEquals(18, Iterables.getOnlyElement(aggregatorValues.getValues()).intValue());

  res.close();
}
 
开发者ID:shakamunyi,项目名称:spark-dataflow,代码行数:33,代码来源:MultiOutputWordCountTest.java


注:本文中的com.google.cloud.dataflow.sdk.values.PCollectionTuple类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。