當前位置: 首頁>>代碼示例>>Python>>正文


Python DataflowRunner._pardo_fn_data方法代碼示例

本文整理匯總了Python中apache_beam.runners.DataflowRunner._pardo_fn_data方法的典型用法代碼示例。如果您正苦於以下問題:Python DataflowRunner._pardo_fn_data方法的具體用法?Python DataflowRunner._pardo_fn_data怎麽用?Python DataflowRunner._pardo_fn_data使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在apache_beam.runners.DataflowRunner的用法示例。


在下文中一共展示了DataflowRunner._pardo_fn_data方法的1個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。

示例1: run_ParDo

# 需要導入模塊: from apache_beam.runners import DataflowRunner [as 別名]
# 或者: from apache_beam.runners.DataflowRunner import _pardo_fn_data [as 別名]
  def run_ParDo(self, transform_node):
    transform = transform_node.transform
    output = transform_node.outputs[None]
    element_coder = self._get_coder(output)
    map_task_index, producer_index, output_index = self.outputs[
        transform_node.inputs[0]]

    # If any of this ParDo's side inputs depend on outputs from this map_task,
    # we can't continue growing this map task.
    def is_reachable(leaf, root):
      if leaf == root:
        return True
      else:
        return any(is_reachable(x, root) for x in self.dependencies[leaf])

    if any(is_reachable(self.outputs[side_input.pvalue][0], map_task_index)
           for side_input in transform_node.side_inputs):
      # Start a new map tasks.
      input_element_coder = self._get_coder(transform_node.inputs[0])

      output_buffer = OutputBuffer(input_element_coder)

      fusion_break_write = operation_specs.WorkerInMemoryWrite(
          output_buffer=output_buffer,
          write_windowed_values=True,
          input=(producer_index, output_index),
          output_coders=[input_element_coder])
      self.map_tasks[map_task_index].append(
          (transform_node.full_label + '/Write', fusion_break_write))

      original_map_task_index = map_task_index
      map_task_index, producer_index, output_index = len(self.map_tasks), 0, 0

      fusion_break_read = operation_specs.WorkerRead(
          output_buffer.source_bundle(),
          output_coders=[input_element_coder])
      self.map_tasks.append(
          [(transform_node.full_label + '/Read', fusion_break_read)])

      self.dependencies[map_task_index].add(original_map_task_index)

    def create_side_read(side_input):
      label = self.side_input_labels[side_input]
      output_buffer = self.run_side_write(
          side_input.pvalue, '%s/%s' % (transform_node.full_label, label))
      return operation_specs.WorkerSideInputSource(
          output_buffer.source(), label)

    do_op = operation_specs.WorkerDoFn(  #
        serialized_fn=pickler.dumps(DataflowRunner._pardo_fn_data(
            transform_node,
            lambda side_input: self.side_input_labels[side_input])),
        output_tags=[PropertyNames.OUT] + ['%s_%s' % (PropertyNames.OUT, tag)
                                           for tag in transform.output_tags
                                          ],
        # Same assumption that DataflowRunner has about coders being compatible
        # across outputs.
        output_coders=[element_coder] * (len(transform.output_tags) + 1),
        input=(producer_index, output_index),
        side_inputs=[create_side_read(side_input)
                     for side_input in transform_node.side_inputs])

    producer_index = len(self.map_tasks[map_task_index])
    self.outputs[transform_node.outputs[None]] = (
        map_task_index, producer_index, 0)
    for ix, tag in enumerate(transform.output_tags):
      self.outputs[transform_node.outputs[
          tag]] = map_task_index, producer_index, ix + 1
    self.map_tasks[map_task_index].append((transform_node.full_label, do_op))

    for side_input in transform_node.side_inputs:
      self.dependencies[map_task_index].add(self.outputs[side_input.pvalue][0])
開發者ID:aaltay,項目名稱:incubator-beam,代碼行數:74,代碼來源:maptask_executor_runner.py


注:本文中的apache_beam.runners.DataflowRunner._pardo_fn_data方法示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。