Python PyTorch TwRwSparseFeaturesDist用法及代碼示例

本文簡要介紹python語言中 torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist 的用法。

用法: class torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist(pg: torch._C._distributed_c10d.ProcessGroup, intra_pg: torch._C._distributed_c10d.ProcessGroup, id_list_features_per_rank: List[int], id_score_list_features_per_rank: List[int], id_list_feature_hash_sizes: List[int], id_score_list_feature_hash_sizes: List[int], device: Optional[torch.device] = None, has_feature_processor: bool = False)

基礎：torchrec.distributed.embedding_sharding.BaseSparseFeaturesDist[torchrec.distributed.embedding_types.SparseFeatures]

以 TWRW 方式對稀疏特征進行分桶，然後使用 AlltoAll 集體操作重新分配。

構造函數參數：

pg (dist.ProcessGroup): ProcessGroup 用於AlltoAll 通信。 intra_pg (dist.ProcessGroup): ProcessGroup 在 AlltoAll 的單個主機組內

溝通。

id_list_features_per_rank (List[int])：要發送到的 id 列表特征的數量: 每個等級。
id_score_list_features_per_rank (List[int]): id score list features to: 發送到每個等級

id_list_feature_hash_sizes (List[int])：id 列表特征的哈希大小。 id_score_list_feature_hash_sizes (List[int]): id score list features的哈希大小。設備(可選[torch.device])：將分配緩衝區的設備。 has_feature_processor (bool): 特征處理器的存在(即位置

加權特征)。

例子：

3 features
2 hosts with 2 devices each

Bucketize each feature into 2 buckets
Staggered shuffle with feature splits [2, 1]
AlltoAll operation

NOTE: result of staggered shuffle and AlltoAll operation look the same after
reordering in AlltoAll

Result:
    host 0 device 0:
        feature 0 bucket 0
        feature 1 bucket 0

    host 0 device 1:
        feature 0 bucket 1
        feature 1 bucket 1

    host 1 device 0:
        feature 2 bucket 0

    host 1 device 1:
        feature 2 bucket 1

相關用法

注：本文由純淨天空篩選整理自pytorch.org大神的英文原創作品 torchrec.distributed.sharding.twrw_sharding.TwRwSparseFeaturesDist。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。