Python PyTorch KJTAllToAll用法及代码示例

本文简要介绍python语言中 torchrec.distributed.dist_data.KJTAllToAll 的用法。

用法: class torchrec.distributed.dist_data.KJTAllToAll(pg: torch._C._distributed_c10d.ProcessGroup, splits: List[int], device: Optional[torch.device] = None, stagger: int = 1)

参数：

pg(dist.ProcessGroup) -ProcessGroup 用于AlltoAll 通信。
splits(List[int]) -len(pg.size()) 列表，指示要发送到每个 pg.rank() 的特征数量。假设KeyedJaggedTensor 按目的地等级排序。所有等级都一样。
device(可选的[torch.device]) -将分配缓冲区的设备。
stagger(int) -交错值应用于 recat 张量，有关更多详细信息，请参见 _recat 函数。

基础：torch.nn.modules.module.Module

根据拆分将KeyedJaggedTensor 重新分配到ProcessGroup。

实现利用 AlltoAll 集体作为 torch.distributed 的一部分。需要两次集体调用，一次用于传输最终张量长度(以分配正确的空间)，一次用于传输实际的稀疏值。

例子：

keys=['A','B','C']
splits=[2,1]
kjtA2A = KJTAllToAll(pg, splits, device)
awaitable = kjtA2A(rank0_input)

# where:
# rank0_input is KeyedJaggedTensor holding

#         0           1           2
# 'A'    [A.V0]       None        [A.V1, A.V2]
# 'B'    None         [B.V0]      [B.V1]
# 'C'    [C.V0]       [C.V1]      None

# rank1_input is KeyedJaggedTensor holding

#         0           1           2
# 'A'     [A.V3]      [A.V4]      None
# 'B'     None        [B.V2]      [B.V3, B.V4]
# 'C'     [C.V2]      [C.V3]      None

rank0_output = awaitable.wait()

# where:
# rank0_output is KeyedJaggedTensor holding

#         0           1           2           3           4           5
# 'A'     [A.V0]      None      [A.V1, A.V2]  [A.V3]      [A.V4]      None
# 'B'     None        [B.V0]    [B.V1]        None        [B.V2]      [B.V3, B.V4]

# rank1_output is KeyedJaggedTensor holding
#         0           1           2           3           4           5
# 'C'     [C.V0]      [C.V1]      None        [C.V2]      [C.V3]      None

相关用法

注：本文由纯净天空筛选整理自pytorch.org大神的英文原创作品 torchrec.distributed.dist_data.KJTAllToAll。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。