Python tf.distribute.experimental.CommunicationOptions用法及代码示例

跨设备通信的选项，例如All-reduce。

用法

tf.distribute.experimental.CommunicationOptions(
    bytes_per_pack=0, timeout_seconds=None,
    implementation=tf.distribute.experimental.CollectiveCommunication.AUTO
)

参数

bytes_per_pack 一个非负整数。将集体操作分成一定大小的包。如果为零，则自动确定该值。除了 TPUStrategy 之外，所有 multi-replica 策略都遵循此提示。
timeout_seconds 浮点数或无，以秒为单位超时。如果不是 None，如果花费的时间超过此超时时间，则集体会引发 tf.errors.DeadlineExceededError。零禁用超时。这在调试挂起问题时很有用。这应该只用于调试，因为它为每个集合创建一个新线程，即 timeout_seconds * num_collectives_per_second 更多线程的开销。这仅适用于 tf.distribute.experimental.MultiWorkerMirroredStrategy 。
implementation 一个 tf.distribute.experimental.CommunicationImplementation 。这是对首选通信实现的提示。可能的值包括 AUTO , RING 和 NCCL 。 NCCL 通常对 GPU 性能更高，但不适用于 CPU。这仅适用于 tf.distribute.experimental.MultiWorkerMirroredStrategy 。

抛出

ValueError 当参数具有无效值时。

这可以传递给 tf.distribute.get_replica_context().all_reduce() 等方法以优化集体操作性能。请注意，这些只是提示，可能会也可能不会改变实际行为。某些选项仅适用于某些策略，而被其他选项忽略。

一种常见的优化是将梯度all-reduce分成多个包，以便权重更新可以与梯度all-reduce重叠。

例子：

options = tf.distribute.experimental.CommunicationOptions(
    bytes_per_pack=50 * 1024 * 1024,
    timeout_seconds=120.0,
    implementation=tf.distribute.experimental.CommunicationImplementation.NCCL
)
grads = tf.distribute.get_replica_context().all_reduce(
    'sum', grads, options=options)
optimizer.apply_gradients(zip(grads, vars),
    experimental_aggregate_gradients=False)

相关用法

注：本文由纯净天空筛选整理自tensorflow.org大神的英文原创作品 tf.distribute.experimental.CommunicationOptions。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。