Python PyTorch LazyModuleMixin用法及代碼示例

本文簡要介紹python語言中 torch.nn.modules.lazy.LazyModuleMixin 的用法。

用法:
class torch.nn.modules.lazy.LazyModuleMixin(*args, **kwargs)

用於延遲初始化參數的模塊的 mixin，也稱為“延遲模塊”。

延遲初始化參數的模塊或“lazy modules”，將其參數的形狀從第一個輸入推導出到它們的前向方法。在第一次轉發之前，它們包含不應訪問或使用的 torch.nn.UninitializedParameter ，之後它們包含常規的 torch.nn.Parameter 。惰性模塊很方便，因為它們不需要計算某些模塊參數，例如典型 torch.nn.Linear 的 in_features 參數。

構建後，具有惰性模塊的網絡應首先轉換為所需的 dtype 並放置在預期的設備上。這是因為惰性模塊僅執行形狀推斷，因此通常的 dtype 和設備放置行為適用。然後惰性模塊應該執行“dry runs” 來初始化模塊中的所有組件。這些“dry runs” 通過網絡發送正確大小、dtype 和設備的輸入，並發送到它的每個惰性模塊。在此之後，網絡可以照常使用。

>>> class LazyMLP(torch.nn.Module):
...    def __init__(self):
...        super().__init__()
...        self.fc1 = torch.nn.LazyLinear(10)
...        self.relu1 = torch.nn.ReLU()
...        self.fc2 = torch.nn.LazyLinear(1)
...        self.relu2 = torch.nn.ReLU()
...
...    def forward(self, input):
...        x = self.relu1(self.fc1(input))
...        y = self.relu2(self.fc2(x))
...        return y
>>> # constructs a network with lazy modules
>>> lazy_mlp = LazyMLP()
>>> # transforms the network's device and dtype
>>> # NOTE: these transforms can and should be applied after construction and before any 'dry runs'
>>> lazy_mlp = mlp.cuda().double()
>>> lazy_mlp
LazyMLP( (fc1): LazyLinear(in_features=0, out_features=10, bias=True)
  (relu1): ReLU()
  (fc2): LazyLinear(in_features=0, out_features=1, bias=True)
  (relu2): ReLU()
)
>>> # performs a dry run to initialize the network's lazy modules
>>> lazy_mlp(torch.ones(10,10).cuda())
>>> # after initialization, LazyLinear modules become regular Linear modules
>>> lazy_mlp
LazyMLP(
  (fc1): Linear(in_features=10, out_features=10, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=10, out_features=1, bias=True)
  (relu2): ReLU()
)
>>> # attaches an optimizer, since parameters can now be used as usual
>>> optim = torch.optim.SGD(mlp.parameters(), lr=0.01)

使用惰性模塊時的最後一個警告是網絡參數的初始化順序可能會改變，因為惰性模塊總是在其他模塊之後初始化。例如，如果上麵定義的 LazyMLP 類首先有一個 torch.nn.LazyLinear 模塊，然後是常規的 torch.nn.Linear ，則第二個模塊將在構造時初始化，第一個模塊將在第一次試運行期間初始化。這可能導致使用惰性模塊的網絡參數的初始化方式與不使用惰性模塊的網絡參數的初始化方式不同，因為參數初始化的順序(通常取決於有狀態隨機數生成器)不同。檢查再現性以了解更多詳細信息。

惰性模塊可以像其他模塊一樣使用狀態字典進行序列化。例如：

>>> lazy_mlp = LazyMLP()
>>> # The state dict shows the uninitialized parameters
>>> lazy_mlp.state_dict()
OrderedDict([('fc1.weight', Uninitialized parameter),
             ('fc1.bias',
              tensor([-1.8832e+25,  4.5636e-41, -1.8832e+25,  4.5636e-41, -6.1598e-30,
                       4.5637e-41, -1.8788e+22,  4.5636e-41, -2.0042e-31,  4.5637e-41])),
             ('fc2.weight', Uninitialized parameter),
             ('fc2.bias', tensor([0.0019]))])

惰性模塊可以加載常規 torch.nn.Parameter (即您可以序列化/反序列化初始化的 LazyModules 並且它們將保持初始化狀態)

>>> full_mlp = LazyMLP()
>>> # Dry run to initialize another module
>>> full_mlp.forward(torch.ones(10, 1))
>>> # Load an initialized state into a lazy module
>>> lazy_mlp.load_state_dict(full_mlp.state_dict())
>>> # The state dict now holds valid values
>>> lazy_mlp.state_dict()
OrderedDict([('fc1.weight',
              tensor([[-0.3837],
                      [ 0.0907],
                      [ 0.6708],
                      [-0.5223],
                      [-0.9028],
                      [ 0.2851],
                      [-0.4537],
                      [ 0.6813],
                      [ 0.5766],
                      [-0.8678]])),
             ('fc1.bias',
              tensor([-1.8832e+25,  4.5636e-41, -1.8832e+25,  4.5636e-41, -6.1598e-30,
                       4.5637e-41, -1.8788e+22,  4.5636e-41, -2.0042e-31,  4.5637e-41])),
             ('fc2.weight',
              tensor([[ 0.1320,  0.2938,  0.0679,  0.2793,  0.1088, -0.1795, -0.2301,  0.2807,
                        0.2479,  0.1091]])),
             ('fc2.bias', tensor([0.0019]))])

但是請注意，如果加載的參數在加載狀態時被初始化，則在執行 “dry run” 時不會替換加載的參數。這可以防止在不同的上下文中使用已初始化的模塊。

相關用法

注：本文由純淨天空篩選整理自pytorch.org大神的英文原創作品 torch.nn.modules.lazy.LazyModuleMixin。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。