Python PyTorch LazyModuleMixin用法及代码示例

本文简要介绍python语言中 torch.nn.modules.lazy.LazyModuleMixin 的用法。

用法:
class torch.nn.modules.lazy.LazyModuleMixin(*args, **kwargs)

用于延迟初始化参数的模块的 mixin，也称为“延迟模块”。

延迟初始化参数的模块或“lazy modules”，将其参数的形状从第一个输入推导出到它们的前向方法。在第一次转发之前，它们包含不应访问或使用的 torch.nn.UninitializedParameter ，之后它们包含常规的 torch.nn.Parameter 。惰性模块很方便，因为它们不需要计算某些模块参数，例如典型 torch.nn.Linear 的 in_features 参数。

构建后，具有惰性模块的网络应首先转换为所需的 dtype 并放置在预期的设备上。这是因为惰性模块仅执行形状推断，因此通常的 dtype 和设备放置行为适用。然后惰性模块应该执行“dry runs” 来初始化模块中的所有组件。这些“dry runs” 通过网络发送正确大小、dtype 和设备的输入，并发送到它的每个惰性模块。在此之后，网络可以照常使用。

>>> class LazyMLP(torch.nn.Module):
...    def __init__(self):
...        super().__init__()
...        self.fc1 = torch.nn.LazyLinear(10)
...        self.relu1 = torch.nn.ReLU()
...        self.fc2 = torch.nn.LazyLinear(1)
...        self.relu2 = torch.nn.ReLU()
...
...    def forward(self, input):
...        x = self.relu1(self.fc1(input))
...        y = self.relu2(self.fc2(x))
...        return y
>>> # constructs a network with lazy modules
>>> lazy_mlp = LazyMLP()
>>> # transforms the network's device and dtype
>>> # NOTE: these transforms can and should be applied after construction and before any 'dry runs'
>>> lazy_mlp = mlp.cuda().double()
>>> lazy_mlp
LazyMLP( (fc1): LazyLinear(in_features=0, out_features=10, bias=True)
  (relu1): ReLU()
  (fc2): LazyLinear(in_features=0, out_features=1, bias=True)
  (relu2): ReLU()
)
>>> # performs a dry run to initialize the network's lazy modules
>>> lazy_mlp(torch.ones(10,10).cuda())
>>> # after initialization, LazyLinear modules become regular Linear modules
>>> lazy_mlp
LazyMLP(
  (fc1): Linear(in_features=10, out_features=10, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=10, out_features=1, bias=True)
  (relu2): ReLU()
)
>>> # attaches an optimizer, since parameters can now be used as usual
>>> optim = torch.optim.SGD(mlp.parameters(), lr=0.01)

使用惰性模块时的最后一个警告是网络参数的初始化顺序可能会改变，因为惰性模块总是在其他模块之后初始化。例如，如果上面定义的 LazyMLP 类首先有一个 torch.nn.LazyLinear 模块，然后是常规的 torch.nn.Linear ，则第二个模块将在构造时初始化，第一个模块将在第一次试运行期间初始化。这可能导致使用惰性模块的网络参数的初始化方式与不使用惰性模块的网络参数的初始化方式不同，因为参数初始化的顺序(通常取决于有状态随机数生成器)不同。检查再现性以了解更多详细信息。

惰性模块可以像其他模块一样使用状态字典进行序列化。例如：

>>> lazy_mlp = LazyMLP()
>>> # The state dict shows the uninitialized parameters
>>> lazy_mlp.state_dict()
OrderedDict([('fc1.weight', Uninitialized parameter),
             ('fc1.bias',
              tensor([-1.8832e+25,  4.5636e-41, -1.8832e+25,  4.5636e-41, -6.1598e-30,
                       4.5637e-41, -1.8788e+22,  4.5636e-41, -2.0042e-31,  4.5637e-41])),
             ('fc2.weight', Uninitialized parameter),
             ('fc2.bias', tensor([0.0019]))])

惰性模块可以加载常规 torch.nn.Parameter (即您可以序列化/反序列化初始化的 LazyModules 并且它们将保持初始化状态)

>>> full_mlp = LazyMLP()
>>> # Dry run to initialize another module
>>> full_mlp.forward(torch.ones(10, 1))
>>> # Load an initialized state into a lazy module
>>> lazy_mlp.load_state_dict(full_mlp.state_dict())
>>> # The state dict now holds valid values
>>> lazy_mlp.state_dict()
OrderedDict([('fc1.weight',
              tensor([[-0.3837],
                      [ 0.0907],
                      [ 0.6708],
                      [-0.5223],
                      [-0.9028],
                      [ 0.2851],
                      [-0.4537],
                      [ 0.6813],
                      [ 0.5766],
                      [-0.8678]])),
             ('fc1.bias',
              tensor([-1.8832e+25,  4.5636e-41, -1.8832e+25,  4.5636e-41, -6.1598e-30,
                       4.5637e-41, -1.8788e+22,  4.5636e-41, -2.0042e-31,  4.5637e-41])),
             ('fc2.weight',
              tensor([[ 0.1320,  0.2938,  0.0679,  0.2793,  0.1088, -0.1795, -0.2301,  0.2807,
                        0.2479,  0.1091]])),
             ('fc2.bias', tensor([0.0019]))])

但是请注意，如果加载的参数在加载状态时被初始化，则在执行 “dry run” 时不会替换加载的参数。这可以防止在不同的上下文中使用已初始化的模块。

相关用法

注：本文由纯净天空筛选整理自pytorch.org大神的英文原创作品 torch.nn.modules.lazy.LazyModuleMixin。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。