Python PyTorch apply_effects_file用法及代码示例

本文简要介绍python语言中 torchaudio.sox_effects.apply_effects_file 的用法。

用法: torchaudio.sox_effects.apply_effects_file(path: str, effects: List[List[str]], normalize: bool = True, channels_first: bool = True, format: Optional[str] = None) → Tuple[torch.Tensor, int]

参数：

path(path-like 对象或者file-like 对象) -
音频数据的来源。当函数不是由 TorchScript 编译时(例如 torch.jit.script )，接受以下类型：
- path-like：文件路径
- file-like ：具有read(size: int) -> bytes方法的对象，最多返回size长度的字节字符串。
TorchScript编译函数时，只允许str类型。

注意：此参数特意注释为str，只是为了与TorchScript 编译器兼容。
effects(List[List[str]]) -效果列表。
normalize(bool,可选的) -当 True 时，此函数始终返回 float32 ，并且样本值被归一化为 [-1.0, 1.0] 。如果输入文件是整数 WAV，则给出 False 会将生成的张量类型更改为整数类型。此参数对除整数 WAV 类型以外的格式无效。
channels_first(bool,可选的) -如果为 True，则返回的张量具有维度 [channel, time] 。否则，返回的张量维度为 [time, channel] 。
format(str或者None,可选的) -用给定的格式覆盖格式检测。当 libsox 无法从标头或扩展名推断格式时，提供参数可能会有所帮助，

生成的张量和采样率。如果 normalize=True ，则生成的张量始终是 float32 类型。如果normalize=False 并且输入音频文件是整数 WAV 文件，则生成的张量具有相应的整数类型。 (注意不支持 24 位整数类型)如果 channels_first=True ，则生成的张量具有维度 [channel, time] ，否则为 [time, channel] 。

返回类型：

(张量，int)

将 sox 效果应用于音频文件并将生成的数据加载为张量

注意

此函数的工作方式与sox 命令非常相似，但有细微差别。例如，sox commnad 会自动添加某些效果(例如 speed 、pitch 等之后的 rate 效果)，但此函数仅适用于给定的效果。因此，要实际应用speed 效果，您还需要为rate 效果提供所需的采样率，因为在内部，speed 效果只会改变采样率并保持样本不变。

示例 - 基本用法

>>>
>>> # Defines the effects to apply
>>> effects = [
...     ['gain', '-n'],  # normalises to 0dB
...     ['pitch', '5'],  # 5 cent pitch shift
...     ['rate', '8000'],  # resample to 8000 Hz
... ]
>>>
>>> # Apply effects and load data with channels_first=True
>>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True)
>>>
>>> # Check the result
>>> waveform.shape
torch.Size([2, 8000])
>>> waveform
tensor([[ 5.1151e-03,  1.8073e-02,  2.2188e-02,  ...,  1.0431e-07,
         -1.4761e-07,  1.8114e-07],
        [-2.6924e-03,  2.1860e-03,  1.0650e-02,  ...,  6.4122e-07,
         -5.6159e-07,  4.8103e-07]])
>>> sample_rate
8000

示例 - 对数据集应用随机速度扰动

>>>
>>> # Load data from file, apply random speed perturbation
>>> class RandomPerturbationFile(torch.utils.data.Dataset):
...     """Given flist, apply random speed perturbation
...
...     Suppose all the input files are at least one second long.
...     """
...     def __init__(self, flist: List[str], sample_rate: int):
...         super().__init__()
...         self.flist = flist
...         self.sample_rate = sample_rate
...
...     def __getitem__(self, index):
...         speed = 0.5 + 1.5 * random.randn()
...         effects = [
...             ['gain', '-n', '-10'],  # apply 10 db attenuation
...             ['remix', '-'],  # merge all the channels
...             ['speed', f'{speed:.5f}'],  # duration is now 0.5 ~ 2.0 seconds.
...             ['rate', f'{self.sample_rate}'],
...             ['pad', '0', '1.5'],  # add 1.5 seconds silence at the end
...             ['trim', '0', '2'],  # get the first 2 seconds
...         ]
...         waveform, _ = torchaudio.sox_effects.apply_effects_file(
...             self.flist[index], effects)
...         return waveform
...
...     def __len__(self):
...         return len(self.flist)
...
>>> dataset = RandomPerturbationFile(file_list, sample_rate=8000)
>>> loader = torch.utils.data.DataLoader(dataset, batch_size=32)
>>> for batch in loader:
>>>     pass

相关用法

注：本文由纯净天空筛选整理自pytorch.org大神的英文原创作品 torchaudio.sox_effects.apply_effects_file。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数：

返回：

返回类型：