In a scenario with multiple persons talking simultaneously, the spatial characteristics of the signals are the most distinct feature for extracting the target signal. In this work, we develop a deep joint spatial-spectral non-linear filter that can be steered in an arbitrary target direction. For this we propose a simple and effective conditioning mechanism, which sets the initial state of the filter's recurrent layers based on the target direction. We show that this scheme is more effective than the baseline approach and increases the flexibility of the filter at no performance cost. The resulting spatially selective non-linear filters can also be used for speech separation of an arbitrary number of speakers and enable very accurate multi-speaker localization as we demonstrate in this paper.
翻译:在多个人同时交谈的情况下,信号的空间特征是提取目标信号的最明显特征。在这项工作中,我们开发了一个可以任意定向的深层空间光谱非线性过滤器。为此,我们提议了一个简单而有效的调节机制,根据目标方向确定过滤器经常性层的初始状态。我们表明,这个方案比基线方法更有效,提高了过滤器的灵活性,而无需付出任何性能成本。由此产生的空间选择性非线性过滤器也可以用于任意分隔一些发言者的语音,并使得我们在本文件中显示的非常准确的多语种定位。