Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios. However, little work has been done on adapting such methods to track consistently multiple sources appearing and disappearing, as would occur in reality. In this paper, we present a new training strategy for deep learning SSL models with a straightforward implementation based on the mean squared error of the optimal association between estimated and reference positions in the preceding time frames. It optimizes the desired properties of a tracking system: handling a time-varying number of sources and ordering localization estimates according to their trajectories, minimizing identity switches (IDSs). Evaluation on simulated data of multiple reverberant moving sources and on two model architectures proves its effectiveness on reducing identity switches without compromising frame-wise localization accuracy.
翻译:最近的基于数据和学习的可靠源本地化方法显示,在具有挑战性的声学情景中表现良好,然而,在调整这些方法以跟踪不断出现的和消失的多种源(现实中将出现和消失的情况)方面,没有做多少工作;在本文件中,我们为深层次学习的SSL模型提出了新的培训战略,根据前一个时间框架内估计位置和参考位置之间最佳联系的平方差差差,直接实施。它优化了跟踪系统的理想特性:处理一个时间变化的源数,并根据它们的轨迹订购本地化估计数,尽量减少身份开关(IDS);对多个回动源的模拟数据和两个模型结构的评价证明了其在减少身份开关方面的效力,而不损害框架对本地化的准确性。