Multiple moving sound source localization in real-world scenarios remains a challenging issue due to interaction between sources, time-varying trajectories, distorted spatial cues, etc. In this work, we propose to use deep learning techniques to learn competing and time-varying direct-path phase differences for localizing multiple moving sound sources. A causal convolutional recurrent neural network is designed to extract the direct-path phase difference sequence from signals of each microphone pair. To avoid the assignment ambiguity and the problem of uncertain output-dimension encountered when simultaneously predicting multiple targets, the learning target is designed in a weighted sum format, which encodes source activity in the weight and direct-path phase differences in the summed value. The learned direct-path phase differences for all microphone pairs can be directly used to construct the spatial spectrum according to the formulation of steered response power (SRP). This deep neural network (DNN) based SRP method is referred to as SRP-DNN. The locations of sources are estimated by iteratively detecting and removing the dominant source from the spatial spectrum, in which way the interaction between sources is reduced. Experimental results on both simulated and real-world data show the superiority of the proposed method in the presence of noise and reverberation.
翻译:在现实世界情景中,由于源、时间变化轨迹、扭曲的空间信号等之间的相互作用,多重移动源的多重声音源本地化仍然是一个具有挑战性的问题。 在这项工作中,我们提议使用深层次的学习技术,学习竞争和时间变化的直接偏向阶段差异,以对多个移动声音源进行本地化。一个因果循环神经网络的设计,目的是从每个麦克风配对的信号中提取直接偏向阶段差异序列。为了避免分配模棱两可和在同时预测多个目标时遇到产出分化的不确定性问题,学习目标的设计是以加权和总和的形式,将源活动在重量和直向相阶段的差异编码。所有麦克风配对所学的直接正向阶段差异可以直接用于根据定向响应能量的配制(SRP)来构建空间频谱。基于SRP方法的深神经网络(DNNN)被称为SRP-DNNN。源的位置是通过迭代探测和从空间频谱中去除主要源来估计的,从而缩小源之间的相互作用。模拟和真实世界数据定位的结果显示,在模拟和真实数据中显示的优越性。