With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-T-LSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AEC-challenge baseline by 0.27 in terms of Mean Opinion Score (MOS).
翻译:由于对音频通信和在线会议的需求不断增加,在复杂的声学情景下,包括噪音、回响和非线性扭曲,确保声频取消的稳健性已成为一个首要问题,尽管有一些传统方法考虑到非线性扭曲,但对于反声抑制来说仍然效率低下,当噪音出现时,性能将减弱。在本文件中,我们提出了一个实时反声取消法办法,利用复杂的神经网络,更好地模拟重要阶段信息和频率-LSTM(F-T-LSTM)(F-T-LSTM),以扫描频率和时间轴,更好地进行时间建模。此外,我们利用经修改的SI-SRR作为成本函数,使模型更好地反映取消和抑制噪音(NS)的性能。由于只有1.4M参数,拟议的方法在平均评分方面比AEC-Callenge基准高出0.27。