Python tf.keras.mixed_precision.Policy用法及代碼示例

Keras 層的 dtype 策略。

用法

tf.keras.mixed_precision.Policy(
    name
)

參數

name 策略名稱，用於確定計算和變量 dtype。可以是任何 dtype 名稱，例如 'float32' 或 'float64' ，這會導致計算和變量 dtype 都是該 dtype。也可以是字符串 'mixed_float16' 或 'mixed_bfloat16' ，這導致計算 dtype 為 float16 或 bfloat16，變量 dtype 為 float32。

屬性

compute_dtype 此策略的計算數據類型。
這是 dtype 層將在其中進行計算。通常，層也輸出具有計算 dtype 的張量。

請注意，即使計算 dtype 是 float16 或 bfloat16，硬件設備也可能不會在 float16 或 bfloat16 中執行單獨的加法、乘法和其他基本操作，而是可以在 float32 中執行其中一些操作以保持數值穩定性。計算 dtype 是該層執行的 TensorFlow 操作的輸入和輸出的 dtype。在內部，許多 TensorFlow 操作將在 float32 或其他一些 device-internal 中間格式中進行某些內部計算，其精度高於 float16/bfloat16，以提高數值穩定性。

例如，tf.keras.layers.Dense 層在具有 float16 計算 dtype 的 GPU 上運行時，會將 float16 輸入傳遞給 tf.linalg.matmul 。但是，tf.linalg.matmul 將使用 float32 中間數學。 float16 的性能優勢仍然很明顯，因為增加了內存帶寬，並且現代 GPU 具有專門的硬件來計算 float16 輸入上的 matmuls，同時仍將中間計算保留在 float32 中。
name 返回此策略的名稱。
variable_dtype 此策略的變量 dtype。
這是 dtype 圖層將在其中創建其變量，除非圖層明確選擇不同的 dtype。如果這與 Policy.compute_dtype 不同，Layers 會將變量轉換為計算 dtype 以避免類型錯誤。

變量正則化器在變量 dtype 中運行，而不是在計算 dtype 中運行。

數據類型策略確定層的計算和可變數據類型。每一層都有一個策略。可以將策略傳遞給層構造函數的 dtype 參數，或者可以使用 tf.keras.mixed_precision.set_global_policy 設置全局策略。

通常，您隻需要在使用混合精度時與 dtype 策略進行交互，即使用 float16 或 bfloat16 進行計算，使用 float32 進行變量。這就是為什麽術語mixed_precision 出現在 API 名稱中的原因。可以通過將 'mixed_float16' 或 'mixed_bfloat16' 傳遞給 tf.keras.mixed_precision.set_global_policy 來啟用混合精度。有關如何使用混合精度的更多信息，請參閱混合精度指南。

tf.keras.mixed_precision.set_global_policy('mixed_float16')
layer1 = tf.keras.layers.Dense(10)
layer1.dtype_policy  # `layer1` will automatically use mixed precision
<Policy "mixed_float16">
# Can optionally override layer to use float32 instead of mixed precision.
layer2 = tf.keras.layers.Dense(10, dtype='float32')
layer2.dtype_policy
<Policy "float32">
# Set policy back to initial float32 for future examples.
tf.keras.mixed_precision.set_global_policy('float32')

在上麵的示例中，將 dtype='float32' 傳遞給層等效於傳遞 dtype=tf.keras.mixed_precision.Policy('float32') 。通常，將 dtype 策略名稱傳遞給層相當於傳遞相應的策略，因此永遠不需要顯式構造 Policy 對象。

注意：如果您使用 'mixed_float16' 策略，Model.compile 將使用 tf.keras.mixed_precision.LossScaleOptimizer 自動包裝優化器。如果您使用自定義訓練循環而不是調用 Model.compile ，則應明確使用 tf.keras.mixed_precision.LossScaleOptimizer 以避免使用 float16 的數字下溢。

層如何使用其策略的計算數據類型

層將其輸入轉換為其計算 dtype。這會導致層的計算和輸出也在計算 dtype 中。例如：

x = tf.ones((4, 4, 4, 4), dtype='float64')
# `layer`'s policy defaults to float32.
layer = tf.keras.layers.Conv2D(filters=4, kernel_size=2)
layer.compute_dtype  # Equivalent to layer.dtype_policy.compute_dtype
'float32'
# `layer` casts its inputs to its compute dtype and does computations in
# that dtype.
y = layer(x)
y.dtype
tf.float32

請注意，基礎tf.keras.layers.Layer 類插入了演員表。如果對您自己的層進行子類化，則不必插入任何類型轉換。

目前，隻有層的 call 方法的第一個參數中的張量被轉換(盡管這可能會在未來的次要版本中更改)。例如：

class MyLayer(tf.keras.layers.Layer):
  # Bug! `b` will not be casted.
  def call(self, a, b):
    return a + 1., b + 1.
a = tf.constant(1., dtype="float32")
b = tf.constant(1., dtype="float32")
layer = MyLayer(dtype="float64")
x, y = layer(a, b)
x.dtype
tf.float64
y.dtype
tf.float32

如果使用多個輸入編寫自己的層，您應該在call 中將其他張量顯式轉換為self.compute_dtype，或者接受第一個參數中的所有張量作為列表。

轉換僅在 TensorFlow 2 中發生。如果已調用 tf.compat.v1.disable_v2_behavior()，您可以使用 tf.compat.v1.keras.layers.enable_v2_dtype_behavior() 啟用轉換行為。

層如何使用其策略的變量 dtype

tf.keras.layers.Layer.add_weight 創建的變量的默認 dtype 是 layer's policy's 變量 dtype。

如果層的計算和變量 dtype 不同，add_weight 將使用稱為 AutoCastVariable 的特殊包裝器包裝浮點變量。 AutoCastVariable 與原始變量相同，隻是在 Layer.call 中使用時它會將自身轉換為層的計算 dtype。這意味著如果您正在編寫一個層，則不必將變量顯式轉換為層的計算 dtype。例如：

class SimpleDense(tf.keras.layers.Layer):

  def build(self, input_shape):
    # With mixed precision, self.kernel is a float32 AutoCastVariable
    self.kernel = self.add_weight('kernel', (input_shape[-1], 10))

  def call(self, inputs):
    # With mixed precision, self.kernel will be casted to float16
    return tf.linalg.matmul(inputs, self.kernel)

layer = SimpleDense(dtype='mixed_float16')
y = layer(tf.ones((10, 10)))
y.dtype
tf.float16
layer.kernel.dtype
tf.float32

圖層作者可以通過將 experimental_autocast=False 傳遞給 add_weight 來防止變量被 AutoCastVariable 包裝，這在必須在圖層內訪問變量的 float32 值時非常有用。

如何編寫支持混合精度和 float64 的層。

在大多數情況下，層將自動支持混合精度和 float64，無需任何額外工作，因為基礎層會自動轉換輸入，創建正確類型的變量，並且在混合精度的情況下，使用 AutoCastVariables 包裝變量.

您需要額外工作來支持混合精度或 float64 的主要情況是當您創建新張量時，例如使用 tf.ones 或 tf.random.normal ，在這種情況下，您必須創建正確 dtype 的張量。例如，如果您調用 tf.random.normal ，則必須傳遞計算 dtype，這是輸入已轉換為的 dtype：

class AddRandom(tf.keras.layers.Layer):

  def call(self, inputs):
    # We must pass `dtype=inputs.dtype`, otherwise a TypeError may
    # occur when adding `inputs` to `rand`.
    rand = tf.random.normal(shape=inputs.shape, dtype=inputs.dtype)
    return inputs + rand
layer = AddRandom(dtype='mixed_float16')
y = layer(x)
y.dtype
tf.float16

如果您沒有將 dtype=inputs.dtype 傳遞給 tf.random.normal ，則會發生 TypeError。這是因為 tf.random.normal 的 dtype 默認為 "float32" ，但輸入 dtype 是 float16。您不能使用 float16 張量添加 float32 張量。

相關用法

注：本文由純淨天空篩選整理自tensorflow.org大神的英文原創作品 tf.keras.mixed_precision.Policy。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。