Sound event localization and detection is a novel area of research that emerged from the combined interest of analyzing the acoustic scene in terms of the spatial and temporal activity of sounds of interest. This paper presents an overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge. A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset. The overview presents in detail how the systems were evaluated and ranked and the characteristics of the best-performing systems. Common strategies in terms of input features, model architectures, training approaches, exploitation of prior knowledge, and data augmentation are discussed. Since ranking in the challenge was based on individually evaluating localization and event classification performance, part of the overview focuses on presenting metrics for the joint measurement of the two, together with a reevaluation of submissions using these new metrics. The new analysis reveals submissions that performed better on the joint task of detecting the correct type of event close to its original location than some of the submissions that were ranked higher in the challenge. Consequently, ranking of submissions which performed strongly when evaluated separately on detection or localization, but not jointly on both, was affected negatively.
翻译:正确事件定位和探测是一个新颖的研究领域,它产生于结合对声学场景进行有意义的空间和时间活动分析的兴趣,本文件概述了作为DCASE 2019挑战的一项任务而组织的第一次关于正确事件定位和探测的国际评价,为挑战产生了大规模现实的空间化声音事件数据集,用于培训学习方法,并用于在未加标记的子集中评估提交材料。概览详细介绍了如何对系统进行评价和排位,以及最佳系统的特点。从投入特征、模型结构、培训方法、利用先前知识以及数据扩充等方面讨论了共同战略。由于挑战的排名是以单独评估地方化和事件分类绩效为基础,因此,概览的一部分侧重于提出对两种情况进行联合衡量的衡量标准,同时用这些新指标对提交材料进行重新评价。新分析显示,在发现接近其原始位置的正确事件类型联合任务方面,所履行的比某些提交材料在挑战中处于较高位置的共同任务要好。因此,在对提交材料进行单独评估时,对提交材料进行了负面的排序。