Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models, and investigate the generalization capability of our models to other datasets with noise characteristics unseen during training.
翻译:语音增强是许多面向用户的音频应用的关键组成部分,但目前的系统仍然受到扭曲和非自然产出的影响。尽管基因模型在语音合成方面显示出巨大的潜力,但它们在增强语音方面仍然落后。这项工作利用了传播概率模型方面的最新进展,并提出了一个新的语音增强算法,将观察到的噪音语音信号的特性纳入传播和反向过程。更具体地说,我们建议对传播概率模型作一个通用的表述,称为有条件的传播概率模型,在其反向过程中,可以适应估计语音信号中的非加澳新真实噪音。我们在实验中展示了与具有代表性的变异模型相比,拟议方法的强劲表现,并调查了我们模型与其他在培训过程中看不到的噪音特征数据集的通用能力。