In traditional speech denoising tasks, clean audio signals are often used as the training target, but absolutely clean signals are collected from expensive recording equipment or in studios with the strict environments. To overcome this drawback, we propose an end-to-end self-supervised speech denoising training scheme using only noisy audio signals, named Only-Noisy Training (ONT), without extra training conditions. The proposed ONT strategy constructs training pairs only from each single noisy audio, and it contains two modules: training audio pairs generated module and speech denoising module. The first module adopts a random audio sub-sampler on each noisy audio to generate training pairs. The sub-sampled pairs are then fed into a novel complex-valued speech denoising module. Experimental results show that the proposed method not only eliminates the high dependence on clean targets of traditional audio denoising tasks, but also achieves on-par or better performance than other training strategies. Availability-ONT is available at https://github.com/liqingchunnnn/Only-Noisy-Training
翻译:在传统的言语淡化任务中,清洁的音频信号往往被用作培训目标,但绝对清洁的信号是从昂贵的录音设备或在有严格环境的录音室中收集的。为了克服这一缺陷,我们建议只使用噪音音频信号,称为 " 唯一噪音培训 " (ONT),在不附加培训条件的情况下,采用终端到终端自我监督的言语淡化培训计划。拟议的ONT战略只从每个噪音音频中建立培训配对,它包含两个模块:培训音频配对生成模块和音频淡化模块。第一个模块在每部噪音音频上随机使用音频子简便器来生成培训配对。然后,将副标的对子配对输入一个新的复杂估价的言语淡化模块。实验结果表明,拟议方法不仅消除了对传统音频淡化任务清洁目标的高度依赖,而且还实现了在线或比其他培训战略更好的业绩。