In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), which is a recurrent neural network (RNN) augmented with self-attention. Next, an ARN is introduced along the spatial dimension for spatial context aggregation. TPARN is designed as a multiple-input and multiple-output architecture to enhance all input channels simultaneously. Experimental results demonstrate the superiority of TPARN over existing state-of-the-art approaches.
翻译:在这项工作中,我们提出了一个新的模型,称为 " 三重心常态网络 ",用于在时间域内加强多通道语音。TRARN将单一通道双轨网络扩展至多通道网络,在空间层面增加第三条路径。首先,TRARN使用双重心常态网络(ARN)独立处理所有渠道的语音信号,这是一个经常神经网络,并辅之以自我关注。接着,在空间背景汇总的空间层面引入了ARN。TRARN设计为多输入和多输出结构,以同时增强所有输入渠道。实验结果显示TRARN优于现有最新方法。