Python PyTorch TwRwSparseFeaturesDist用法及代码示例

本文简要介绍python语言中 torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist 的用法。

用法: class torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist(pg: torch._C._distributed_c10d.ProcessGroup, intra_pg: torch._C._distributed_c10d.ProcessGroup, id_list_features_per_rank: List[int], id_score_list_features_per_rank: List[int], id_list_feature_hash_sizes: List[int], id_score_list_feature_hash_sizes: List[int], device: Optional[torch.device] = None, has_feature_processor: bool = False)

基础：torchrec.distributed.embedding_sharding.BaseSparseFeaturesDist[torchrec.distributed.embedding_types.SparseFeatures]

以 TWRW 方式对稀疏特征进行分桶，然后使用 AlltoAll 集体操作重新分配。

构造函数参数：

pg (dist.ProcessGroup): ProcessGroup 用于AlltoAll 通信。 intra_pg (dist.ProcessGroup): ProcessGroup 在 AlltoAll 的单个主机组内

沟通。

id_list_features_per_rank (List[int])：要发送到的 id 列表特征的数量: 每个等级。
id_score_list_features_per_rank (List[int]): id score list features to: 发送到每个等级

id_list_feature_hash_sizes (List[int])：id 列表特征的哈希大小。 id_score_list_feature_hash_sizes (List[int]): id score list features的哈希大小。设备(可选[torch.device])：将分配缓冲区的设备。 has_feature_processor (bool): 特征处理器的存在(即位置

加权特征)。

例子：

3 features
2 hosts with 2 devices each

Bucketize each feature into 2 buckets
Staggered shuffle with feature splits [2, 1]
AlltoAll operation

NOTE: result of staggered shuffle and AlltoAll operation look the same after
reordering in AlltoAll

Result:
    host 0 device 0:
        feature 0 bucket 0
        feature 1 bucket 0

    host 0 device 1:
        feature 0 bucket 1
        feature 1 bucket 1

    host 1 device 0:
        feature 2 bucket 0

    host 1 device 1:
        feature 2 bucket 1

相关用法

注：本文由纯净天空筛选整理自pytorch.org大神的英文原创作品 torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。