Recently, score-based generative models have been successfully employed for the task of speech enhancement. A stochastic differential equation is used to model the iterative forward process, where at each step environmental noise and white Gaussian noise are added to the clean speech signal. While in limit the mean of the forward process ends at the noisy mixture, in practice it stops earlier and thus only at an approximation of the noisy mixture. This results in a discrepancy between the terminating distribution of the forward process and the prior used for solving the reverse process at inference. In this paper, we address this discrepancy. To this end, we propose a forward process based on a Brownian bridge and show that such a process leads to a reduction of the mismatch compared to previous diffusion processes. More importantly, we show that our approach improves in objective metrics over the baseline process with only half of the iteration steps and having one hyperparameter less to tune.
翻译:最近,在语音增强任务中成功采用了基于分数的基因化模型。 一种随机差分方程式用于模拟迭代前方过程, 每一步将环境噪音和白色高斯噪音添加到干净的语音信号中。 在限制前方过程的平均值时,杂音混合结束,实际上它更早停止,因此仅接近噪音混合物。这导致前方过程的终止分布与先前用于解决反向推理过程的先前使用的差异。 在本文中,我们处理这一差异。 为此,我们提议了一个以布朗桥为基础的前方进程,并表明这样一个进程会减少与先前扩散过程的不匹配。 更重要的是,我们表明,我们的方法比基线过程的客观衡量标准有所改进,只有半数的迭代步骤,而一个超参数的调差一个。</s>