Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly. However, these loss terms are typically designed to reduce the distortion of phase spectrum values at specific frequencies, which ensures they do not significantly affect the quality of the enhanced speech. In this paper, we propose an effective phase reconstruction strategy for neural speech enhancement that can operate in noisy environments. Specifically, we introduce a phase continuity loss that considers relative phase variations across the time and frequency axes. By including this phase continuity loss in a state-of-the-art neural speech enhancement system trained with reconstruction loss and a number of magnitude spectral losses, we show that our proposed method further improves the quality of enhanced speech signals over the baseline, especially when training is done jointly with a magnitude spectrum loss.
翻译:现代神经言语增强模型通常包括不同形式的阶段信息,无论是明示还是暗示的,其培训损失术语通常包括不同形式的阶段信息,然而,这些损失术语通常旨在减少特定频率的阶段频谱值的扭曲,从而确保它们不会对强化言语的质量产生显著影响。在本文件中,我们提议了一项有效的神经言语增强阶段重建战略,可以在吵闹的环境中运作。具体地说,我们引入了阶段连续性损失,考虑到时间和频率轴之间的相对阶段变化。通过将这一阶段的连续性损失纳入一个经过重建损失和若干程度的光谱损失培训的最先进的神经言语增强系统,我们表明,我们拟议的方法进一步提高了强化言语信号的质量,超过基线,特别是当培训与大规模频谱损失同时进行时。