In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so that less information is lost across the layers, leading to a better feature extraction. In parallel, we test the system's ability to localize up to 3 sources, in which case the improved feature extraction provides the most significant boost in accuracy. We evaluate and compare these improved configurations on synthetic and real-world data. The obtained results show a quite substantial improvement of the multiple sound source localization performance over the baseline network.
翻译:在这项工作中,我们提议扩大一个基于不断变换的神经网络和安比音信号的最先进的多源本地化系统。我们通过改变卷发层和集合层之间的布局,大大改进基线网络的性能。我们提议若干配置,在层层之间增加进化层,集中规模较小,从而在层层之间减少信息丢失,从而导致更好的特征提取。同时,我们测试系统将最多3个源子本地化的能力,在这种情况下,改进的地物提取能极大地提高准确性。我们评估和比较这些经过改进的合成和真实世界数据配置。获得的结果显示,基线网络多源本地化功能有了相当大的改进。