Acoustic echo cancellation (AEC) is designed to remove echoes, reverberation, and unwanted added sounds from the microphone signal while maintaining the quality of the near-end speaker's speech. This paper proposes adaptive speech quality complex neural networks to focus on specific tasks for real-time acoustic echo cancellation. In specific, we propose a complex modularize neural network with different stages to focus on feature extraction, acoustic separation, and mask optimization receptively. Furthermore, we adopt the contrastive learning framework and novel speech quality aware loss functions to further improve the performance. The model is trained with 72 hours for pre-training and then 72 hours for fine-tuning. The proposed model outperforms the state-of-the-art performance.
翻译:声波回声取消(AEC)旨在从麦克风信号中去除回声、回响和意外增加的声音,同时保持近端演讲者演讲的质量。本文建议适应性语言质量的复杂神经网络,侧重于实时声波取消的具体任务。具体地说,我们提议了一个复杂的模块化神经网络,其不同阶段侧重于地物提取、声学分离和面罩优化。此外,我们采用了对比性学习框架和新颖的语音质量认知损失功能,以进一步改善性能。模型经过72小时的培训,用于预培训,然后72小时进行微调。拟议模型优于最先进的性能。