Python tf.GradientTape用法及代码示例

记录自动微分操作。

用法

tf.GradientTape(
    persistent=False, watch_accessed_variables=True
)

参数

persistent 控制是否创建持久渐变磁带的布尔值。默认为 False，这意味着最多可以对该对象的 gradient() 方法进行一次调用。
watch_accessed_variables 控制磁带是否会自动 watch 在磁带处于活动状态时访问的任何(可训练)变量的布尔值。默认为 True 意味着可以从从读取可训练 Variable 导出的磁带中计算的任何结果请求梯度。如果 False 用户必须明确地 watch 任何他们想要请求渐变的 Variable s。

如果在此上下文管理器中执行操作并且至少其中一个输入是"watched"，则记录操作。

可训练变量(由 tf.Variable 或 tf.compat.v1.get_variable 创建，其中 trainable=True 在两种情况下都是默认值)会被自动监视。可以通过在此上下文管理器上调用 watch 方法来手动观察张量。

例如，考虑函数 y = x * x 。 x = 3.0 处的梯度可以计算为：

x = tf.constant(3.0)
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)
tf.Tensor(6.0, shape=(), dtype=float32)

GradientTapes 可以嵌套计算高阶导数。例如，

x = tf.constant(5.0)
with tf.GradientTape() as g:
  g.watch(x)
  with tf.GradientTape() as gg:
    gg.watch(x)
    y = x * x
  dy_dx = gg.gradient(y, x)  # dy_dx = 2 * x
d2y_dx2 = g.gradient(dy_dx, x)  # d2y_dx2 = 2
print(dy_dx)
tf.Tensor(10.0, shape=(), dtype=float32)
print(d2y_dx2)
tf.Tensor(2.0, shape=(), dtype=float32)

默认情况下，GradientTape 持有的资源会在 GradientTape.gradient() 方法被调用后立即释放。要在同一计算中计算多个梯度，请创建一个持久梯度磁带。这允许多次调用 gradient() 方法，因为当磁带对象被垃圾回收时资源被释放。例如：

x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
  g.watch(x)
  y = x * x
  z = y * y
dz_dx = g.gradient(z, x)  # (4*x^3 at x = 3)
print(dz_dx)
tf.Tensor(108.0, shape=(), dtype=float32)
dy_dx = g.gradient(y, x)
print(dy_dx)
tf.Tensor(6.0, shape=(), dtype=float32)

默认情况下，GradientTape 将自动监视在上下文中访问的任何可训练变量。如果您想对监视哪些变量进行细粒度控制，您可以通过将watch_accessed_variables=False 传递给磁带构造函数来禁用自动跟踪：

x = tf.Variable(2.0)
w = tf.Variable(5.0)
with tf.GradientTape(
    watch_accessed_variables=False, persistent=True) as tape:
  tape.watch(x)
  y = x ** 2  # Gradients will be available for `x`.
  z = w ** 3  # No gradients will be available as `w` isn't being watched.
dy_dx = tape.gradient(y, x)
print(dy_dx)
tf.Tensor(4.0, shape=(), dtype=float32)
# No gradients will be available as `w` isn't being watched.
dz_dy = tape.gradient(z, w)
print(dz_dy)
None

请注意，在使用模型时，您应该确保在使用 watch_accessed_variables=False 时变量存在。否则很容易让你的第一次迭代没有任何渐变：

a = tf.keras.layers.Dense(32)
b = tf.keras.layers.Dense(32)

with tf.GradientTape(watch_accessed_variables=False) as tape:
  tape.watch(a.variables)  # Since `a.build` has not been called at this point
                           # `a.variables` will return an empty list and the
                           # tape will not be watching anything.
  result = b(a(inputs))
  tape.gradient(result, a.variables)  # The result of this computation will be
                                      # a list of `None`s since a's variables
                                      # are not being watched.

请注意，只有具有实数或复数 dtype 的张量是可微的。

相关用法

注：本文由纯净天空筛选整理自tensorflow.org大神的英文原创作品 tf.GradientTape。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。