当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


Python PyTorch TwRwSparseFeaturesDist用法及代码示例


本文简要介绍python语言中 torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist 的用法。

用法:

class torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist(pg: torch._C._distributed_c10d.ProcessGroup, intra_pg: torch._C._distributed_c10d.ProcessGroup, id_list_features_per_rank: List[int], id_score_list_features_per_rank: List[int], id_list_feature_hash_sizes: List[int], id_score_list_feature_hash_sizes: List[int], device: Optional[torch.device] = None, has_feature_processor: bool = False)

基础:torchrec.distributed.embedding_sharding.BaseSparseFeaturesDist[torchrec.distributed.embedding_types.SparseFeatures]

以 TWRW 方式对稀疏特征进行分桶,然后使用 AlltoAll 集体操作重新分配。

构造函数参数:

pg (dist.ProcessGroup): ProcessGroup 用于AlltoAll 通信。 intra_pg (dist.ProcessGroup): ProcessGroup 在 AlltoAll 的单个主机组内

沟通。

id_list_features_per_rank (List[int]):要发送到的 id 列表特征的数量

每个等级。

id_score_list_features_per_rank (List[int]): id score list features to

发送到每个等级

id_list_feature_hash_sizes (List[int]):id 列表特征的哈希大小。 id_score_list_feature_hash_sizes (List[int]): id score list features的哈希大小。设备(可选[torch.device]):将分配缓冲区的设备。 has_feature_processor (bool): 特征处理器的存在(即位置

加权特征)。

例子:

3 features
2 hosts with 2 devices each

Bucketize each feature into 2 buckets
Staggered shuffle with feature splits [2, 1]
AlltoAll operation

NOTE: result of staggered shuffle and AlltoAll operation look the same after
reordering in AlltoAll

Result:
    host 0 device 0:
        feature 0 bucket 0
        feature 1 bucket 0

    host 0 device 1:
        feature 0 bucket 1
        feature 1 bucket 1

    host 1 device 0:
        feature 2 bucket 0

    host 1 device 1:
        feature 2 bucket 1

相关用法


注:本文由纯净天空筛选整理自pytorch.org大神的英文原创作品 torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。