Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio. For the practical problem of unknown number of sources, existing localization algorithms attempt to predict a likelihood-based coding (i.e., spatial spectrum) and employ a pre-determined threshold to detect the source number and corresponding DOA value. However, these threshold-based algorithms are not stable since they are limited by the careful choice of threshold. To address this problem, we propose an iterative sound source localization approach called ISSL, which can iteratively extract each source's DOA without threshold until the termination criterion is met. Unlike threshold-based algorithms, ISSL designs an active source detector network based on binary classifier to accept residual spatial spectrum and decide whether to stop the iteration. By doing so, our ISSL can deal with an arbitrary number of sources, even more than the number of sources seen during the training stage. The experimental results show that our ISSL achieves significant performance improvements in both DOA estimation and source number detection compared with the existing threshold-based algorithms.
翻译:声源本地化的目的是从观测到的多通道音频中寻找所有声源源的抵达方向(DOA)。对于来源数量不详的实际问题,现有的本地化算法试图预测一种基于可能性的编码(即空间频谱),并使用预先确定的阈值来检测源数和相应的DOA值。然而,这些基于阈值的算法并不稳定,因为它们受到谨慎选择阈值的限制。为了解决这一问题,我们提议了一种迭代音源本地化方法,称为ISSL,它可以在达到终止标准之前反复提取每个源的DOA。与基于阈值的算法不同,ISL设计了一个以二进制分类法为基础的主动源探测器网络,以接受剩余空间频谱并决定是否停止重复。这样,我们的ISL就可以处理任意数量的源数,甚至超过培训阶段所看到的源数。实验结果表明,我们的ISAL在DA估计和源数探测方面都取得了显著的业绩改进,而与现有的基于阈值的算法相比。