Diffusion models have recently shown promising results for difficult enhancement tasks such as the conditional and unconditional restoration of natural images and audio signals. In this work, we explore the possibility of leveraging a recently proposed advanced iterative diffusion model, namely cold diffusion, to recover clean speech signals from noisy signals. The unique mathematical properties of the sampling process from cold diffusion could be utilized to restore high-quality samples from arbitrary degradations. Based on these properties, we propose an improved training algorithm and objective to help the model generalize better during the sampling process. We verify our proposed framework by investigating two model architectures. Experimental results on benchmark speech enhancement dataset VoiceBank-DEMAND demonstrate the strong performance of the proposed approach compared to representative discriminative models and diffusion-based enhancement models.
翻译:在这项工作中,我们探讨是否可能利用最近提出的先进的迭代传播模型,即冷传,从噪音信号中恢复清洁的语音信号。从冷传取的取样过程的独特数学特性可用来恢复任意退化的高质量样品。基于这些特性,我们建议改进培训算法和目的,以帮助模型在取样过程中更好地普及。我们通过调查两个模型来核查我们提议的框架。关于基准语音增强数据集的实验结果显示,与具有代表性的歧视性模型和基于传播的增强模型相比,拟议方法的出色表现。