Speech enhancement (SE) performance has improved considerably owing to the use of deep learning models as a base function. Herein, we propose a perceptual contrast stretching (PCS) approach to further improve SE performance. The PCS is derived based on the critical band importance function and is applied to modify the targets of the SE model. Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance. Compared with post-processing-based implementations, incorporating PCS into the training phase preserves performance and reduces online computation. Notably, PCS can be combined with different SE model architectures and training criteria. Furthermore, PCS does not affect the causality or convergence of SE model training. Experimental results on the VoiceBank-DEMAND dataset show that the proposed method can achieve state-of-the-art performance on both causal (PESQ score = 3.07) and noncausal (PESQ score = 3.35) SE tasks.
翻译:由于使用深层学习模式作为基础功能,语音增强(SE)的性能有了很大改善。我们在此建议一种概念对比拉伸(PCS)法,以进一步改善SE的性能。PCS是根据关键波段重要性功能产生的,用于修改SE模型的目标。具体地说,目标特征的对比根据概念重要性而拉大,从而改善SE的总体性能。与基于后处理的执行相比,将PCS纳入培训阶段可以保持业绩并减少在线计算。值得注意的是,PCS可以与不同的SE模型结构和培训标准相结合。此外,PCS并不影响SE模式培训的因果关系或趋同性。VoiceBank-DEMAND数据集的实验结果显示,拟议方法可以实现因果性(PESQ评分=3.07)和非因果性(PESQ评分=3.35 SE任务)。