Python tf.distribute.cluster_resolver.TPUClusterResolver用法及代码示例

适用于 Google Cloud TPU 的集群解析器。

继承自：ClusterResolver

用法

tf.distribute.cluster_resolver.TPUClusterResolver(
    tpu=None, zone=None, project=None, job_name='worker',
    coordinator_name=None, coordinator_address=None,
    credentials='default', service=None, discovery_url=None
)

参数

tpu 与要使用的 TPU 对应的字符串。它可以是 TPU 名称或 TPU worker gRPC 地址。如果未设置，它将尝试自动解析 Cloud TPU 上的 TPU 地址。如果设置为"local"，它将假定 TPU 直接连接到 VM，而不是通过网络连接。
zone TPU 所在的区域。如果省略或为空，我们将假设 TPU 的区域与 GCE 虚拟机的区域相同，我们将尝试从 GCE 元数据服务中发现。
project 包含 Cloud TPU 的 GCP 项目的名称。如果省略或为空，我们将尝试从 GCE 元数据服务中发现 GCE VM 的项目名称。
job_name TPU 所属的 TensorFlow 作业的名称。
coordinator_name 用于协调器的名称。如果协调器不应包含在计算的 ClusterSpec 中，则设置为 None。
coordinator_address 协调器的地址(通常是 ip:port 对)。如果设置为 None，将启动 TF 服务器。如果coordinator_name 为None，即使coordinator_address 为None，也不会启动TF 服务器。
credentials GCE 证书。如果没有，那么我们使用来自 oauth2client 的默认凭据
service googleapiclient.discovery 函数返回的 GCE API 对象。如果您指定自定义服务对象，则凭据参数将被忽略。
discovery_url 指向发现服务位置的 URL 模板。它应该有两个参数 {api} 和 {apiVersion} 在填写时会生成该服务的发现文档的绝对 URL。环境变量'TPU_API_DISCOVERY_URL' 将覆盖它。

抛出

ImportError 如果未安装 googleapiclient。
ValueError 如果没有指定 TPU。
RuntimeError 如果指定了一个空的 TPU 名称并且它在 Google Cloud 环境中运行。

属性

environment 返回 TensorFlow 运行的当前环境。

task_id 返回此任务 IDClusterResolver表示。

在 TensorFlow 分布式环境中，每个作业可能有一个适用的任务 id，它是实例在其任务类型中的索引。当用户需要根据任务索引运行特定代码时，这很有用。例如，

cluster_spec = tf.train.ClusterSpec({
    "ps":["localhost:2222", "localhost:2223"],
    "worker":["localhost:2224", "localhost:2225", "localhost:2226"]
})

# SimpleClusterResolver is used here for illustration; other cluster
# resolvers may be used for other source of task type/id.
simple_resolver = SimpleClusterResolver(cluster_spec, task_type="worker",
                                        task_id=0)

...

if cluster_resolver.task_type == 'worker' and cluster_resolver.task_id == 0:
  # Perform something that's only applicable on 'worker' type, id 0. This
  # block will run on this particular instance since we've specified this
  # task to be a 'worker', id 0 in above cluster resolver.
else:
  # Perform something that's only applicable on other ids. This block will
  # not run on this particular instance.

如果此类信息不可用或不适用于当前分布式环境(例如使用 tf.distribute.cluster_resolver.TPUClusterResolver 进行训练)，则返回 None。

有关详细信息，请参阅 tf.distribute.cluster_resolver.ClusterResolver 的类文档字符串。

task_type 返回此任务类型ClusterResolver表示。

在 TensorFlow 分布式环境中，每个作业都可能有一个适用的任务类型。 TensorFlow 中的有效任务类型包括 'chief'：被指定承担更多责任的工作人员、'worker'：用于训练/评估的常规工作人员、'ps'：参数服务器或 'evaluator'：评估检查点的评估程序用于指标。

有关最常用的'chief' 和'worker' 任务类型的更多信息，请参阅Multi-worker 配置。

当用户需要根据任务类型运行特定代码时，访问此类信息非常有用。例如，

cluster_spec = tf.train.ClusterSpec({
    "ps":["localhost:2222", "localhost:2223"],
    "worker":["localhost:2224", "localhost:2225", "localhost:2226"]
})

# SimpleClusterResolver is used here for illustration; other cluster
# resolvers may be used for other source of task type/id.
simple_resolver = SimpleClusterResolver(cluster_spec, task_type="worker",
                                        task_id=1)

...

if cluster_resolver.task_type == 'worker':
  # Perform something that's only applicable on workers. This block
  # will run on this particular instance since we've specified this task to
  # be a worker in above cluster resolver.
elif cluster_resolver.task_type == 'ps':
  # Perform something that's only applicable on parameter servers. This
  # block will not run on this particular instance.

如果此类信息不可用或不适用于当前分布式环境(例如使用 tf.distribute.experimental.TPUStrategy 进行训练)，则返回 None。

有关详细信息，请参阅 tf.distribute.cluster_resolver.ClusterResolver 的课程文档。

这是 Google Cloud TPU 服务的集群解析器的实现。

TPUClusterResolver 支持以下不同的环境： Google Compute Engine Google Kubernetes Engine Google internal

它可以传递到tf.distribute.TPUStrategy 以支持在 Cloud TPU 上进行 TF2 训练。

相关用法

注：本文由纯净天空筛选整理自tensorflow.org大神的英文原创作品 tf.distribute.cluster_resolver.TPUClusterResolver。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。