In this paper, we propose a novel Single Noisy Audio De-noising Framework (SNA-DF) for speech denoising using only single noisy audio samples, which overcomes the limi-tation of constructing either noisy-clean training pairs or multiple independent noisy audio samples. The proposed SNA-DF contains two modules: training audio pairs gener-ated module and audio denoising module. The first module adopts a random audio sub-sampler on single noisy audio samples for the generation of training audio pairs. The sub-sampled training audio pairs are then fed into the audio denoising module, which employs a deep complex U-Net incorporating a complex two-stage transformer (cTSTM) to extract both magnitude and phase information for taking full advantage of the complex features of single noisy au-dios. Experimental results show that the proposed SNA-DF not only eliminates the high dependence on clean targets of traditional audio denoising methods, but also outperforms the methods using multiple noisy audio samples.
翻译:在本文中,我们提出一个新的单一噪音音频分解框架(SNA-DF),用于仅使用单声响音频样本进行语音分解,这克服了建造吵闹清洁培训配对或多个独立吵闹音频样本的微缩作用。拟议的SNA-DF包含两个模块:对音频配对进行基因化模块和音频分解模块进行培训。第一个模块在单声响音样样本上采用随机音频子采集器,用于生成培训音频配对。然后,将次级抽样培训音频配对输入音频分解模块,该模块使用一个包含复杂的两阶段变异器(cTSTM)的深层复杂的UNet,以提取规模和阶段信息,充分利用单一音响音区复杂的特征。实验结果表明,拟议的SNA-DF不仅消除了对传统音频分解解法清洁目标高度依赖,而且超越了使用多声响音频样本的方法。