Python PyTorch KJTAllToAll用法及代碼示例

本文簡要介紹python語言中 torchrec.distributed.dist_data.KJTAllToAll 的用法。

用法: class torchrec.distributed.dist_data.KJTAllToAll(pg: torch._C._distributed_c10d.ProcessGroup, splits: List[int], device: Optional[torch.device] = None, stagger: int = 1)

參數：

pg(dist.ProcessGroup) -ProcessGroup 用於AlltoAll 通信。
splits(List[int]) -len(pg.size()) 列表，指示要發送到每個 pg.rank() 的特征數量。假設KeyedJaggedTensor 按目的地等級排序。所有等級都一樣。
device(可選的[torch.device]) -將分配緩衝區的設備。
stagger(int) -交錯值應用於 recat 張量，有關更多詳細信息，請參見 _recat 函數。

基礎：torch.nn.modules.module.Module

根據拆分將KeyedJaggedTensor 重新分配到ProcessGroup。

實現利用 AlltoAll 集體作為 torch.distributed 的一部分。需要兩次集體調用，一次用於傳輸最終張量長度(以分配正確的空間)，一次用於傳輸實際的稀疏值。

例子：

keys=['A','B','C']
splits=[2,1]
kjtA2A = KJTAllToAll(pg, splits, device)
awaitable = kjtA2A(rank0_input)

# where:
# rank0_input is KeyedJaggedTensor holding

#         0           1           2
# 'A'    [A.V0]       None        [A.V1, A.V2]
# 'B'    None         [B.V0]      [B.V1]
# 'C'    [C.V0]       [C.V1]      None

# rank1_input is KeyedJaggedTensor holding

#         0           1           2
# 'A'     [A.V3]      [A.V4]      None
# 'B'     None        [B.V2]      [B.V3, B.V4]
# 'C'     [C.V2]      [C.V3]      None

rank0_output = awaitable.wait()

# where:
# rank0_output is KeyedJaggedTensor holding

#         0           1           2           3           4           5
# 'A'     [A.V0]      None      [A.V1, A.V2]  [A.V3]      [A.V4]      None
# 'B'     None        [B.V0]    [B.V1]        None        [B.V2]      [B.V3, B.V4]

# rank1_output is KeyedJaggedTensor holding
#         0           1           2           3           4           5
# 'C'     [C.V0]      [C.V1]      None        [C.V2]      [C.V3]      None

相關用法

注：本文由純淨天空篩選整理自pytorch.org大神的英文原創作品 torchrec.distributed.dist_data.KJTAllToAll。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。