Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating non-target signals from the Gaussian noise and noisy signals) could be utilized to restore clean signals. Based on this property, we propose a diffusion probabilistic model-based speech enhancement (DiffuSE) model that aims to recover clean speech signals from noisy signals. The fundamental architecture of the proposed DiffuSE model is similar to that of DiffWave--a high-quality audio waveform generation model that has a relatively low computational cost and footprint. To attain better enhancement performance, we designed an advanced reverse process, termed the supportive reverse process, which adds noisy speech in each time-step to the predicted speech. The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus SE task. Moreover, relative to the generally suggested full sampling schedule, the proposed supportive reverse process especially improved the fast sampling, taking few steps to yield better enhancement results over the conventional full step inference process.
翻译:反向进程的独特特性(即消除高山噪音和噪音信号的非目标信号)可用于恢复清洁信号。基于此特性,我们提议采用基于扩散概率模型的语音增强模型(Diffuse)模型,该模型旨在从噪音信号中恢复清洁的语音信号。提议的Diffuse模型的基本结构类似于Diffwave-一个高质量的音波形成模型,该模型的计算成本和足迹相对较低。为了实现更好的增强性能,我们设计了一个先进的反向进程,称为支持性反向进程,在预测的演讲的每个步骤增加噪音。实验结果表明,Diffuse生成的性能可与标准化语音银行SE系统任务的相关音频谱化模型相仿。此外,与一般建议的全面抽样计划相比,拟议的支持性反向进程特别改进了快速取样程序,仅采取几步步就可比常规全面步骤产生更好的增强效果。