Python tf.GradientTape用法及代碼示例

記錄自動微分操作。

用法

tf.GradientTape(
    persistent=False, watch_accessed_variables=True
)

參數

persistent 控製是否創建持久漸變磁帶的布爾值。默認為 False，這意味著最多可以對該對象的 gradient() 方法進行一次調用。
watch_accessed_variables 控製磁帶是否會自動 watch 在磁帶處於活動狀態時訪問的任何(可訓練)變量的布爾值。默認為 True 意味著可以從從讀取可訓練 Variable 導出的磁帶中計算的任何結果請求梯度。如果 False 用戶必須明確地 watch 任何他們想要請求漸變的 Variable s。

如果在此上下文管理器中執行操作並且至少其中一個輸入是"watched"，則記錄操作。

可訓練變量(由 tf.Variable 或 tf.compat.v1.get_variable 創建，其中 trainable=True 在兩種情況下都是默認值)會被自動監視。可以通過在此上下文管理器上調用 watch 方法來手動觀察張量。

例如，考慮函數 y = x * x 。 x = 3.0 處的梯度可以計算為：

x = tf.constant(3.0)
with tf.GradientTape() as g:
  g.watch(x)
  y = x * x
dy_dx = g.gradient(y, x)
print(dy_dx)
tf.Tensor(6.0, shape=(), dtype=float32)

GradientTapes 可以嵌套計算高階導數。例如，

x = tf.constant(5.0)
with tf.GradientTape() as g:
  g.watch(x)
  with tf.GradientTape() as gg:
    gg.watch(x)
    y = x * x
  dy_dx = gg.gradient(y, x)  # dy_dx = 2 * x
d2y_dx2 = g.gradient(dy_dx, x)  # d2y_dx2 = 2
print(dy_dx)
tf.Tensor(10.0, shape=(), dtype=float32)
print(d2y_dx2)
tf.Tensor(2.0, shape=(), dtype=float32)

默認情況下，GradientTape 持有的資源會在 GradientTape.gradient() 方法被調用後立即釋放。要在同一計算中計算多個梯度，請創建一個持久梯度磁帶。這允許多次調用 gradient() 方法，因為當磁帶對象被垃圾回收時資源被釋放。例如：

x = tf.constant(3.0)
with tf.GradientTape(persistent=True) as g:
  g.watch(x)
  y = x * x
  z = y * y
dz_dx = g.gradient(z, x)  # (4*x^3 at x = 3)
print(dz_dx)
tf.Tensor(108.0, shape=(), dtype=float32)
dy_dx = g.gradient(y, x)
print(dy_dx)
tf.Tensor(6.0, shape=(), dtype=float32)

默認情況下，GradientTape 將自動監視在上下文中訪問的任何可訓練變量。如果您想對監視哪些變量進行細粒度控製，您可以通過將watch_accessed_variables=False 傳遞給磁帶構造函數來禁用自動跟蹤：

x = tf.Variable(2.0)
w = tf.Variable(5.0)
with tf.GradientTape(
    watch_accessed_variables=False, persistent=True) as tape:
  tape.watch(x)
  y = x ** 2  # Gradients will be available for `x`.
  z = w ** 3  # No gradients will be available as `w` isn't being watched.
dy_dx = tape.gradient(y, x)
print(dy_dx)
tf.Tensor(4.0, shape=(), dtype=float32)
# No gradients will be available as `w` isn't being watched.
dz_dy = tape.gradient(z, w)
print(dz_dy)
None

請注意，在使用模型時，您應該確保在使用 watch_accessed_variables=False 時變量存在。否則很容易讓你的第一次迭代沒有任何漸變：

a = tf.keras.layers.Dense(32)
b = tf.keras.layers.Dense(32)

with tf.GradientTape(watch_accessed_variables=False) as tape:
  tape.watch(a.variables)  # Since `a.build` has not been called at this point
                           # `a.variables` will return an empty list and the
                           # tape will not be watching anything.
  result = b(a(inputs))
  tape.gradient(result, a.variables)  # The result of this computation will be
                                      # a list of `None`s since a's variables
                                      # are not being watched.

請注意，隻有具有實數或複數 dtype 的張量是可微的。

相關用法

注：本文由純淨天空篩選整理自tensorflow.org大神的英文原創作品 tf.GradientTape。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。