Acoustic echo and background noise can seriously degrade the intelligibility of speech. In practice, echo and noise suppression are usually treated as two separated tasks and can be removed with various digital signal processing (DSP) and deep learning techniques. In this paper, we propose a new cascaded model, magnitude and complex temporal convolutional neural network (MC-TCN), to jointly perform acoustic echo cancellation and noise suppression with the help of adaptive filters. The MC-TCN cascades two separation cores, which are used to extract robust magnitude spectra feature and to enhance magnitude and phase simultaneously. Experimental results reveal that the proposed method can achieve superior performance by removing both echo and noise in real-time. In terms of DECMOS, the subjective test shows our method achieves a mean score of 4.41 and outperforms the INTERSPEECH2021 AEC-Challenge baseline by 0.54.
翻译:声波回声和背景噪音可严重降低言语的知觉性,在实践中,回声和抑制噪声通常被作为两个分开的任务处理,可以采用各种数字信号处理和深层学习技术清除,在本文件中,我们提出一个新的级联模型、规模和复杂的时变幻神经网络(MC-TCN),在适应性过滤器的帮助下,联合进行声波取消和抑制噪声。MC-TCN级联有两个分离核心,用于提取强力波谱特性,同时增强音量和阶段性。实验结果表明,拟议的方法可以通过实时消除回声和噪音实现优异性。在DECMOS方面,主观测试表明我们的方法达到4.41的平均值,比INTESPECH2021 AEC-Callenge基线高出0.54。