Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back. In this paper, we present an all-deep-learning framework that implicitly estimates the second order statistics of echo/noise and target speech, and jointly solves echo and noise suppression through an attention based recurrent neural network. The proposed model outperforms the state-of-the-art joint echo cancellation and speech enhancement method F-T-LSTM in terms of objective speech quality metrics, speech recognition accuracy and model complexity. We show that this model can work with speaker embedding for better target speech enhancement and furthermore develop a branch for automatic gain control (AGC) task to form an all-in-one front-end speech enhancement system.
翻译:声波回声取消(AEC)在全复式语音通信和在扩音器反弹时,前端语音强化中起到重要作用,在扩音器反弹时,声频取消(AEC)在全复式语音通信和前端语音强化中起到重要作用。在本文件中,我们提出了一个全深层学习框架,隐含地估算回声/噪音和目标语音的第二顺序统计数据,并通过关注的经常性神经网络共同解决回声和噪音抑制。在客观的语音质量衡量标准、语音识别准确性和模型复杂性方面,拟议模型优于最先进的联合回声取消和语音增强法F-T-LSTM。我们表明,这一模式可以与发言人一起工作,以更好地强化目标语音,并进一步开发一个自动获取控制分支(AGC)任务,以形成一个全在前端语音增强系统。