Most existing deep learning based binaural speaker separation systems focus on producing a monaural estimate for each of the target speakers, and thus do not preserve the interaural cues, which are crucial for human listeners to perform sound localization and lateralization. In this study, we address talker-independent binaural speaker separation with interaural cues preserved in the estimated binaural signals. Specifically, we extend a newly-developed gated recurrent neural network for monaural separation by additionally incorporating self-attention mechanisms and dense connectivity. We develop an end-to-end multiple-input multiple-output system, which directly maps from the binaural waveform of the mixture to those of the speech signals. The experimental results show that our proposed approach achieves significantly better separation performance than a recent binaural separation approach. In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.
翻译:大部分现有的基于深层学习的双声扬声器分离系统都侧重于为每个目标发言者制作一个修道院估计,因此无法保存对于人类听众进行音响本地化和横向化至关重要的跨声导线。在本研究中,我们用在估计的双声波信号中保存的跨声导声导线处理独立双声导线分离问题。具体地说,我们通过进一步结合自我注意机制和密集的连通性,扩展了新开发的寺院分离门常规神经网络。我们开发了一个从语言信号混合的双声波式直接绘制出多输出的终端到终端多输出系统。实验结果显示,我们拟议方法的分离性能大大优于最近的双声导线分离方法。此外,我们的方法有效地维护了声波导线,提高了声音本地化的准确性。