Python PyTorch LayerNorm用法及代碼示例

本文簡要介紹python語言中 torch.nn.LayerNorm 的用法。

用法: class torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None)

參數：

normalized_shape(int或者list或者torch.Size) -
來自預期尺寸輸入的輸入形狀

[* \times \text{normalized\_shape}[0] \times \text{normalized\_shape}[1] \times \ldots \times \text{normalized\_shape}[-1]]
如果使用單個整數，則將其視為單例列表，並且此模塊將對預期具有該特定大小的最後一個維度進行歸一化。
eps-加到分母上的值，以保證數值穩定性。默認值：1e-5
elementwise_affine-一個布爾值，當設置為 True 時，此模塊具有可學習的 per-element 仿射參數，初始化為 1(用於權重)和 0(用於偏差)。默認值：True。

變量：

~LayerNorm.weight-當 elementwise_affine 設置為 True 時，形狀為 \text{normalized\_shape} 的模塊的可學習權重。這些值被初始化為 1。
~LayerNorm.bias-當 elementwise_affine 設置為 True 時，形狀為 \text{normalized\_shape} 的模塊的可學習偏差。這些值被初始化為 0。

如論文 Layer Normalization 中所述，對小批量輸入應用層規範化

y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta

均值和標準差是在最後 D 維度上計算的，其中 D 是 normalized_shape 的維度。例如，如果 normalized_shape 是 (3, 5)(二維形狀)，則在輸入的最後 2 維(即 input.mean((-2, -1)) )上計算平均值和標準差。 \gamma 和 \beta 是 normalized_shape 的可學習仿射變換參數，如果 elementwise_affine 是 True 。標準差是通過有偏估計器計算的，相當於 torch.var(input, unbiased=False) 。

注意

與使用 affine 選項為每個整個通道/平麵應用標量比例和偏差的批量歸一化和實例歸一化不同，層歸一化通過 elementwise_affine 應用 per-element 比例和偏差。

該層使用從訓練和評估模式中的輸入數據計算的統計數據。

形狀：

輸入：(N, *)
輸出：(N, *)(與輸入的形狀相同)

例子：

>>> # NLP Example
>>> batch, sentence_length, embedding_dim = 20, 5, 10
>>> embedding = torch.randn(batch, sentence_length, embedding_dim)
>>> layer_norm = nn.LayerNorm(embedding_dim)
>>> # Activate module
>>> layer_norm(embedding)
>>>
>>> # Image Example
>>> N, C, H, W = 20, 5, 10, 10
>>> input = torch.randn(N, C, H, W)
>>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
>>> # as shown in the image below
>>> layer_norm = nn.LayerNorm([C, H, W])
>>> output = layer_norm(input)

相關用法

注：本文由純淨天空篩選整理自pytorch.org大神的英文原創作品 torch.nn.LayerNorm。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。