Python PyTorch Tacotron2TTSBundle用法及代碼示例

本文簡要介紹python語言中 torchaudio.pipelines.Tacotron2TTSBundle 的用法。

用法: class torchaudio.pipelines.Tacotron2TTSBundle

捆綁相關信息以使用預訓練 Tacotron2 和聲碼器的數據類。

該類提供了用於實例化預訓練模型的接口以及檢索預訓練權重所需的信息以及與模型一起使用的附加數據。

Torchaudio 庫實例化了這個類的對象，每個對象代表一個不同的預訓練模型。客戶端代碼應通過這些實例訪問預訓練模型。

請參閱下麵的用法和可用值。

示例 - 使用 Tacotron2 和 WaveRNN 的基於字符的 TTS 管道

>>> import torchaudio
>>>
>>> text = "Hello, T T S !"
>>> bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH
>>>
>>> # Build processor, Tacotron2 and WaveRNN model
>>> processor = bundle.get_text_processor()
>>> tacotron2 = bundle.get_tacotron2()
Downloading:
100%|███████████████████████████████| 107M/107M [00:01<00:00, 87.9MB/s]
>>> vocoder = bundle.get_vocoder()
Downloading:
100%|███████████████████████████████| 16.7M/16.7M [00:00<00:00, 78.1MB/s]
>>>
>>> # Encode text
>>> input, lengths = processor(text)
>>>
>>> # Generate (mel-scale) spectrogram
>>> specgram, lengths, _ = tacotron2.infer(input, lengths)
>>>
>>> # Convert spectrogram to waveform
>>> waveforms, lengths = vocoder(specgram, lengths)
>>>
>>> torchaudio.save('hello-tts.wav', waveforms[0], vocoder.sample_rate)

示例 - 使用 Tacotron2 和 WaveRNN 的基於音素的 TTS 管道

>>>
>>> # Note:
>>> #     This bundle uses pre-trained DeepPhonemizer as
>>> #     the text pre-processor.
>>> #     Please install deep-phonemizer.
>>> #     See https://github.com/as-ideas/DeepPhonemizer
>>> #     The pretrained weight is automatically downloaded.
>>>
>>> import torchaudio
>>>
>>> text = "Hello, TTS!"
>>> bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_PHONEME_LJSPEECH
>>>
>>> # Build processor, Tacotron2 and WaveRNN model
>>> processor = bundle.get_text_processor()
Downloading:
100%|███████████████████████████████| 63.6M/63.6M [00:04<00:00, 15.3MB/s]
>>> tacotron2 = bundle.get_tacotron2()
Downloading:
100%|███████████████████████████████| 107M/107M [00:01<00:00, 87.9MB/s]
>>> vocoder = bundle.get_vocoder()
Downloading:
100%|███████████████████████████████| 16.7M/16.7M [00:00<00:00, 78.1MB/s]
>>>
>>> # Encode text
>>> input, lengths = processor(text)
>>>
>>> # Generate (mel-scale) spectrogram
>>> specgram, lengths, _ = tacotron2.infer(input, lengths)
>>>
>>> # Convert spectrogram to waveform
>>> waveforms, lengths = vocoder(specgram, lengths)
>>>
>>> torchaudio.save('hello-tts.wav', waveforms[0], vocoder.sample_rate)

相關用法

注：本文由純淨天空篩選整理自pytorch.org大神的英文原創作品 torchaudio.pipelines.Tacotron2TTSBundle。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。