In hands-free communication system, the coupling between loudspeaker and microphone generates echo signal, which can severely influence the quality of communication. Meanwhile, various types of noise in communication environments further reduce speech quality and intelligibility. It is difficult to extract the near-end signal from the microphone signal within one step, especially in low signal-to-noise ratio scenarios. In this paper, we propose a deep complex network approach to address this issue. Specially, we decompose the stereophonic acoustic echo cancellation into two stages, including linear stereophonic acoustic echo cancellation module and residual echo suppression module, where both modules are based on deep learning architectures. A multi-frame filtering strategy is introduced to benefit the estimation of linear echo by capturing more inter-frame information. Moreover, we decouple the complex spectral mapping into magnitude estimation and complex spectrum refinement. Experimental results demonstrate that our proposed approach achieves stage-of-the-art performance over previous advanced algorithms under various conditions.
翻译:在无手通信系统中,扩音器和麦克风之间的连接产生回声信号,这会严重影响通信质量。与此同时,通信环境中的各类噪音进一步降低了语音质量和智能度。很难在一步内从麦克风信号中提取近端信号,特别是在低信号对噪音比率的假设中。在本文件中,我们提出了解决这一问题的深层复杂网络方法。特别是,我们将声响声取消分为两个阶段,包括线性声响取消模块和剩余回声抑制模块,这两个模块都以深层学习结构为基础。采用了多框架过滤战略,通过获取更多框架间信息,对线性回声进行估计。此外,我们把复杂的光谱绘图分为数量估计和复杂频谱改进。实验结果表明,我们所提议的方法在不同条件下取得了前先进算法的阶段性能。