Python PyTorch Wav2Vec2ASRBundle用法及代碼示例

本文簡要介紹python語言中 torchaudio.pipelines.Wav2Vec2ASRBundle 的用法。

用法: class torchaudio.pipelines.Wav2Vec2ASRBundle

捆綁相關信息以使用預訓練的 Wav2Vec2Model 的數據類。

該類提供了用於實例化預訓練模型的接口以及檢索預訓練權重所需的信息以及與模型一起使用的附加數據。

Torchaudio 庫實例化了這個類的對象，每個對象代表一個不同的預訓練模型。客戶端代碼應通過這些實例訪問預訓練模型。

請參閱下麵的用法和可用值。

示例 - ASR

>>> import torchaudio
>>>
>>> bundle = torchaudio.pipelines.HUBERT_ASR_LARGE
>>>
>>> # Build the model and load pretrained weight.
>>> model = bundle.get_model()
Downloading:
100%|███████████████████████████████| 1.18G/1.18G [00:17<00:00, 73.8MB/s]
>>>
>>> # Check the corresponding labels of the output.
>>> labels = bundle.get_labels()
>>> print(labels)
('<s>', '<pad>', '</s>', '<unk>', '|', 'E', 'T', 'A', 'O', 'N', 'I', 'H', 'S', 'R', 'D', 'L', 'U', 'M', 'W', 'C', 'F', 'G', 'Y', 'P', 'B', 'V', 'K', "'", 'X', 'J', 'Q', 'Z')
>>>
>>> # Resample audio to the expected sampling rate
>>> waveform = torchaudio.functional.resample(waveform, sample_rate, bundle.sample_rate)
>>>
>>> # Infer the label probability distribution
>>> emissions, _ = model(waveform)
>>>
>>> # Pass emission to decoder
>>> # `ctc_decode` is for illustration purpose only
>>> transcripts = ctc_decode(emissions, labels)

相關用法

注：本文由純淨天空篩選整理自pytorch.org大神的英文原創作品 torchaudio.pipelines.Wav2Vec2ASRBundle。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。