Python PyTorch profile用法及代碼示例

本文簡要介紹python語言中 torch.profiler.profile 的用法。

用法: class torch.profiler.profile(*, activities=None, schedule=None, on_trace_ready=None, record_shapes=False, profile_memory=False, with_stack=False, with_flops=False, with_modules=False, use_cuda=None)

參數：

activities(可迭代的) -用於分析的活動組(CPU、CUDA)列表，支持的值： torch.profiler.ProfilerActivity.CPU 、 torch.profiler.ProfilerActivity.CUDA 。默認值：ProfilerActivity.CPU 和(如果可用)ProfilerActivity.CUDA。
schedule(可調用的) -可調用的，它將 step (int) 作為單個參數並返回 ProfilerAction 值，該值指定要在每個步驟中執行的分析器操作。
on_trace_ready(可調用的) -在分析期間 schedule 返回 ProfilerAction.RECORD_AND_SAVE 時在每個步驟調用的可調用對象。
record_shapes(bool) -保存有關操作符輸入形狀的信息。
profile_memory(bool) -跟蹤張量內存分配/釋放。
with_stack(bool) -記錄操作的源信息(文件和行號)。
with_flops(bool) -使用公式來估計特定運算符(矩陣乘法和 2D 卷積)的 FLOP(浮點運算)。
with_modules(bool) -記錄操作的調用棧對應的模塊層次結構(包括函數名)。例如如果模塊 A 的轉發調用的模塊 B 的轉發包含 aten::add 操作，則 aten::add 的模塊層次結構為 A.B 請注意，目前僅對 TorchScript 模型而不是即刻模式模型存在此支持。
use_cuda(bool) -

自 1.8.1 版起已棄用：采用activities反而。

Profiler 上下文管理器。

注意

使用schedule() 生成可調用時間表。非默認計劃在分析長時間訓練作業時非常有用，並允許用戶在訓練過程的不同迭代中獲取多個跟蹤。默認計劃隻是在上下文管理器的持續時間內連續記錄所有事件。

注意

使用tensorboard_trace_handler()為TensorBoard生成結果文件：

on_trace_ready=torch.profiler.tensorboard_trace_handler(dir_name)

分析後，可以在指定目錄中找到結果文件。使用命令：

tensorboard --logdir dir_name

在 TensorBoard 中查看結果。欲了解更多信息，請參閱PyTorch Profiler TensorBoard Plugin

注意

啟用形狀和堆棧跟蹤會導致額外的開銷。當指定record_shapes=True 時，profiler 將臨時保存對張量的引用；這可能會進一步阻止某些依賴於引用計數的優化並引入額外的張量副本。

例子：

with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA,
    ]
) as p:
    code_to_profile()
print(p.key_averages().table(
    sort_by="self_cuda_time_total", row_limit=-1))

使用分析器的 schedule 、 on_trace_ready 和 step 函數：

# Non-default profiler schedule allows user to turn profiler on and off
# on different iterations of the training loop;
# trace_handler is called every time a new trace becomes available
def trace_handler(prof):
    print(prof.key_averages().table(
        sort_by="self_cuda_time_total", row_limit=-1))
    # prof.export_chrome_trace("/tmp/test_trace_" + str(prof.step_num) + ".json")

with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA,
    ],

    # In this example with wait=1, warmup=1, active=2,
    # profiler will skip the first step/iteration,
    # start warming up on the second, record
    # the third and the forth iterations,
    # after which the trace will become available
    # and on_trace_ready (when set) is called;
    # the cycle repeats starting with the next step

    schedule=torch.profiler.schedule(
        wait=1,
        warmup=1,
        active=2),
    on_trace_ready=trace_handler
    # on_trace_ready=torch.profiler.tensorboard_trace_handler('./log')
    # used when outputting for tensorboard
    ) as p:
        for iter in range(N):
            code_iteration_to_profile(iter)
            # send a signal to the profiler that the next iteration has started
            p.step()

相關用法

注：本文由純淨天空篩選整理自pytorch.org大神的英文原創作品 torch.profiler.profile。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。