Python PyTorch LayerNorm用法及代码示例

本文简要介绍python语言中 torch.nn.LayerNorm 的用法。

用法: class torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, device=None, dtype=None)

参数：

normalized_shape(int或者list或者torch.Size) -
来自预期尺寸输入的输入形状

[* \times \text{normalized\_shape}[0] \times \text{normalized\_shape}[1] \times \ldots \times \text{normalized\_shape}[-1]]
如果使用单个整数，则将其视为单例列表，并且此模块将对预期具有该特定大小的最后一个维度进行归一化。
eps-加到分母上的值，以保证数值稳定性。默认值：1e-5
elementwise_affine-一个布尔值，当设置为 True 时，此模块具有可学习的 per-element 仿射参数，初始化为 1(用于权重)和 0(用于偏差)。默认值：True。

变量：

~LayerNorm.weight-当 elementwise_affine 设置为 True 时，形状为 \text{normalized\_shape} 的模块的可学习权重。这些值被初始化为 1。
~LayerNorm.bias-当 elementwise_affine 设置为 True 时，形状为 \text{normalized\_shape} 的模块的可学习偏差。这些值被初始化为 0。

如论文 Layer Normalization 中所述，对小批量输入应用层规范化

y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta

均值和标准差是在最后 D 维度上计算的，其中 D 是 normalized_shape 的维度。例如，如果 normalized_shape 是 (3, 5)(二维形状)，则在输入的最后 2 维(即 input.mean((-2, -1)) )上计算平均值和标准差。 \gamma 和 \beta 是 normalized_shape 的可学习仿射变换参数，如果 elementwise_affine 是 True 。标准差是通过有偏估计器计算的，相当于 torch.var(input, unbiased=False) 。

注意

与使用 affine 选项为每个整个通道/平面应用标量比例和偏差的批量归一化和实例归一化不同，层归一化通过 elementwise_affine 应用 per-element 比例和偏差。

该层使用从训练和评估模式中的输入数据计算的统计数据。

形状：

输入：(N, *)
输出：(N, *)(与输入的形状相同)

例子：

>>> # NLP Example
>>> batch, sentence_length, embedding_dim = 20, 5, 10
>>> embedding = torch.randn(batch, sentence_length, embedding_dim)
>>> layer_norm = nn.LayerNorm(embedding_dim)
>>> # Activate module
>>> layer_norm(embedding)
>>>
>>> # Image Example
>>> N, C, H, W = 20, 5, 10, 10
>>> input = torch.randn(N, C, H, W)
>>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
>>> # as shown in the image below
>>> layer_norm = nn.LayerNorm([C, H, W])
>>> output = layer_norm(input)

相关用法

注：本文由纯净天空筛选整理自pytorch.org大神的英文原创作品 torch.nn.LayerNorm。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。