Python PyTorch profile用法及代码示例

本文简要介绍python语言中 torch.profiler.profile 的用法。

用法: class torch.profiler.profile(*, activities=None, schedule=None, on_trace_ready=None, record_shapes=False, profile_memory=False, with_stack=False, with_flops=False, with_modules=False, use_cuda=None)

参数：

activities(可迭代的) -用于分析的活动组(CPU、CUDA)列表，支持的值： torch.profiler.ProfilerActivity.CPU 、 torch.profiler.ProfilerActivity.CUDA 。默认值：ProfilerActivity.CPU 和(如果可用)ProfilerActivity.CUDA。
schedule(可调用的) -可调用的，它将 step (int) 作为单个参数并返回 ProfilerAction 值，该值指定要在每个步骤中执行的分析器操作。
on_trace_ready(可调用的) -在分析期间 schedule 返回 ProfilerAction.RECORD_AND_SAVE 时在每个步骤调用的可调用对象。
record_shapes(bool) -保存有关操作符输入形状的信息。
profile_memory(bool) -跟踪张量内存分配/释放。
with_stack(bool) -记录操作的源信息(文件和行号)。
with_flops(bool) -使用公式来估计特定运算符(矩阵乘法和 2D 卷积)的 FLOP(浮点运算)。
with_modules(bool) -记录操作的调用栈对应的模块层次结构(包括函数名)。例如如果模块 A 的转发调用的模块 B 的转发包含 aten::add 操作，则 aten::add 的模块层次结构为 A.B 请注意，目前仅对 TorchScript 模型而不是即刻模式模型存在此支持。
use_cuda(bool) -

自 1.8.1 版起已弃用：采用activities反而。

Profiler 上下文管理器。

注意

使用schedule() 生成可调用时间表。非默认计划在分析长时间训练作业时非常有用，并允许用户在训练过程的不同迭代中获取多个跟踪。默认计划只是在上下文管理器的持续时间内连续记录所有事件。

注意

使用tensorboard_trace_handler()为TensorBoard生成结果文件：

on_trace_ready=torch.profiler.tensorboard_trace_handler(dir_name)

分析后，可以在指定目录中找到结果文件。使用命令：

tensorboard --logdir dir_name

在 TensorBoard 中查看结果。欲了解更多信息，请参阅PyTorch Profiler TensorBoard Plugin

注意

启用形状和堆栈跟踪会导致额外的开销。当指定record_shapes=True 时，profiler 将临时保存对张量的引用；这可能会进一步阻止某些依赖于引用计数的优化并引入额外的张量副本。

例子：

with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA,
    ]
) as p:
    code_to_profile()
print(p.key_averages().table(
    sort_by="self_cuda_time_total", row_limit=-1))

使用分析器的 schedule 、 on_trace_ready 和 step 函数：

# Non-default profiler schedule allows user to turn profiler on and off
# on different iterations of the training loop;
# trace_handler is called every time a new trace becomes available
def trace_handler(prof):
    print(prof.key_averages().table(
        sort_by="self_cuda_time_total", row_limit=-1))
    # prof.export_chrome_trace("/tmp/test_trace_" + str(prof.step_num) + ".json")

with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA,
    ],

    # In this example with wait=1, warmup=1, active=2,
    # profiler will skip the first step/iteration,
    # start warming up on the second, record
    # the third and the forth iterations,
    # after which the trace will become available
    # and on_trace_ready (when set) is called;
    # the cycle repeats starting with the next step

    schedule=torch.profiler.schedule(
        wait=1,
        warmup=1,
        active=2),
    on_trace_ready=trace_handler
    # on_trace_ready=torch.profiler.tensorboard_trace_handler('./log')
    # used when outputting for tensorboard
    ) as p:
        for iter in range(N):
            code_iteration_to_profile(iter)
            # send a signal to the profiler that the next iteration has started
            p.step()

相关用法

注：本文由纯净天空筛选整理自pytorch.org大神的英文原创作品 torch.profiler.profile。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。