Python PyTorch Transformer用法及代碼示例

本文簡要介紹python語言中 torch.nn.Transformer 的用法。

用法: class torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation=<function relu>, custom_encoder=None, custom_decoder=None, layer_norm_eps=1e-05, batch_first=False, norm_first=False, device=None, dtype=None)

參數：

d_model-編碼器/解碼器輸入中預期特征的數量(默認值=512)。
nhead-多頭注意力模型中的頭數(默認值=8)。
num_encoder_layers-編碼器中sub-encoder-layers 的數量(默認=6)。
num_decoder_layers-解碼器中sub-decoder-layers 的數量(默認=6)。
dim_feedforward-前饋網絡模型的維度(默認=2048)。
dropout-輟學值(默認= 0.1)。
activation-編碼器/解碼器中間層的激活函數，可以是字符串(“relu”或“gelu”)或一元可調用函數。默認值：relu
custom_encoder-自定義編碼器(默認=無)。
custom_decoder-自定義解碼器(默認=無)。
layer_norm_eps-層標準化組件中的 eps 值(默認值=1e-5)。
batch_first-如果 True ，則輸入和輸出張量提供為 (batch, seq, feature)。默認值：False(序列、批處理、特征)。
norm_first-如果True，編碼器和解碼器層將在其他注意和前饋操作之前執行LayerNorms，否則在之後。默認值：False(之後)。

轉換器模型。用戶可以根據需要修改屬性。該架構基於論文“Attention Is All You Need”。 Ashish Vaswani、Noam Shazeer、Niki Parmar、Jakob Uszkoreit、Llion Jones、Aidan N Gomez、Lukasz Kaiser 和 Illia Polosukhin。 2017. 注意力就是你所需要的。在神經信息處理係統的進展中，第 6000-6010 頁。用戶可以構建帶有相應參數的BERT(https://arxiv.org/abs/1810.04805)模型。

例子：：

>>> transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)
>>> src = torch.rand((10, 32, 512))
>>> tgt = torch.rand((20, 32, 512))
>>> out = transformer_model(src, tgt)

注意：為單詞語言模型應用 nn.Transformer 模塊的完整示例可在https://github.com/pytorch/examples/tree/master/word_language_model

相關用法

注：本文由純淨天空篩選整理自pytorch.org大神的英文原創作品 torch.nn.Transformer。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。