Python tf.distribute.cluster_resolver.GCEClusterResolver用法及代码示例

谷歌计算引擎的 ClusterResolver。

继承自：ClusterResolver

用法

tf.distribute.cluster_resolver.GCEClusterResolver(
    project, zone, instance_group, port, task_type='worker', task_id=0,
    rpc_layer='grpc', credentials='default', service=None
)

参数

project GCE 项目的名称。
zone GCE实例组的Zone。
instance_group GCE 实例组的名称。
port 监听 TensorFlow 服务器的端口(默认：8470)
task_type 此 GCE 实例组 VM 实例所属的 TensorFlow 作业的名称。
task_id GCE 实例组中此特定 VM 的任务索引。特别是，每个实例都应该在实例组中手动分配一个唯一的序号索引，以便它们可以相互区分。
rpc_layer TensorFlow 应该使用 RPC 层来跨实例进行通信。
credentials GCE 证书。如果未指定任何内容，则默认为 GoogleCredentials.get_application_default()。
service googleapiclient.discovery 函数返回的 GCE API 对象。 (默认：discovery.build('compute', 'v1'))。如果您指定自定义服务对象，则凭据参数将被忽略。

抛出

ImportError 如果未安装 googleapiclient。

属性

environment 返回 TensorFlow 运行的当前环境。
有两个可能的返回值，"google"(当 TensorFlow 在 Google-internal 环境中运行时)或空字符串(当 TensorFlow 在其他地方运行时)。

如果您正在实现一个在 Google 环境和开源世界中都可以工作的 ClusterResolver(例如，TPU ClusterResolver 或类似的)，您将必须根据环境返回适当的字符串，您必须检测到该字符串。

否则，如果您正在实现仅在开源 TensorFlow 中工作的 ClusterResolver，则无需实现此属性。
rpc_layer

task_id 返回此任务 IDClusterResolver表示。

在 TensorFlow 分布式环境中，每个作业可能有一个适用的任务 id，它是实例在其任务类型中的索引。当用户需要根据任务索引运行特定代码时，这很有用。例如，

cluster_spec = tf.train.ClusterSpec({
    "ps":["localhost:2222", "localhost:2223"],
    "worker":["localhost:2224", "localhost:2225", "localhost:2226"]
})

# SimpleClusterResolver is used here for illustration; other cluster
# resolvers may be used for other source of task type/id.
simple_resolver = SimpleClusterResolver(cluster_spec, task_type="worker",
                                        task_id=0)

...

if cluster_resolver.task_type == 'worker' and cluster_resolver.task_id == 0:
  # Perform something that's only applicable on 'worker' type, id 0. This
  # block will run on this particular instance since we've specified this
  # task to be a 'worker', id 0 in above cluster resolver.
else:
  # Perform something that's only applicable on other ids. This block will
  # not run on this particular instance.

如果此类信息不可用或不适用于当前分布式环境(例如使用 tf.distribute.cluster_resolver.TPUClusterResolver 进行训练)，则返回 None。

有关详细信息，请参阅 tf.distribute.cluster_resolver.ClusterResolver 的类文档字符串。

task_type 返回此任务类型ClusterResolver表示。

在 TensorFlow 分布式环境中，每个作业都可能有一个适用的任务类型。 TensorFlow 中的有效任务类型包括 'chief'：被指定承担更多责任的工作人员、'worker'：用于训练/评估的常规工作人员、'ps'：参数服务器或 'evaluator'：评估检查点的评估程序用于指标。

有关最常用的'chief' 和'worker' 任务类型的更多信息，请参阅Multi-worker 配置。

当用户需要根据任务类型运行特定代码时，访问此类信息非常有用。例如，

cluster_spec = tf.train.ClusterSpec({
    "ps":["localhost:2222", "localhost:2223"],
    "worker":["localhost:2224", "localhost:2225", "localhost:2226"]
})

# SimpleClusterResolver is used here for illustration; other cluster
# resolvers may be used for other source of task type/id.
simple_resolver = SimpleClusterResolver(cluster_spec, task_type="worker",
                                        task_id=1)

...

if cluster_resolver.task_type == 'worker':
  # Perform something that's only applicable on workers. This block
  # will run on this particular instance since we've specified this task to
  # be a worker in above cluster resolver.
elif cluster_resolver.task_type == 'ps':
  # Perform something that's only applicable on parameter servers. This
  # block will not run on this particular instance.

如果此类信息不可用或不适用于当前分布式环境(例如使用 tf.distribute.experimental.TPUStrategy 进行训练)，则返回 None。

有关详细信息，请参阅 tf.distribute.cluster_resolver.ClusterResolver 的课程文档。

这是用于 Google Compute Engine 实例组平台的集群解析器的实现。通过指定项目、区域和实例组，这将检索实例组内所有实例的 IP 地址，并返回适用于分布式 TensorFlow 的 ClusterResolver 对象。

注意：此集群解析程序无法检索 task_type , task_id 或 rpc_layer 。要将其与 tf.distribute.experimental.MultiWorkerMirroredStrategy 等分配策略一起使用，您需要在构造函数中指定 task_type 和 task_id。

tf.distribute.Strategy 的使用示例：

# On worker 0
  cluster_resolver = GCEClusterResolver("my-project", "us-west1",
                                        "my-instance-group",
                                        task_type="worker", task_id=0)
  strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
      cluster_resolver=cluster_resolver)

  # On worker 1
  cluster_resolver = GCEClusterResolver("my-project", "us-west1",
                                        "my-instance-group",
                                        task_type="worker", task_id=1)
  strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(
      cluster_resolver=cluster_resolver)

相关用法

注：本文由纯净天空筛选整理自tensorflow.org大神的英文原创作品 tf.distribute.cluster_resolver.GCEClusterResolver。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。