This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD). The dataset is based on emulation of real recordings of static or moving sound events under real conditions of reverberation and ambient noise, using spatial room impulse responses captured in a variety of rooms and delivered in two spatial formats. The acoustical synthesis remains the same as in the previous iteration of the challenge, however the new dataset brings more challenging conditions of polyphony and overlapping instances of the same class. The most important difference of the new dataset is the introduction of directional interferers, meaning sound events that are localized in space but do not belong to the target classes to be detected and are not annotated. Since such interfering events are expected in every real-world scenario of SELD, the new dataset aims to promote systems that deal with this condition effectively. A modified SELDnet baseline employing the recent ACCDOA representation of SELD problems accompanies the dataset and it is shown to outperform the previous one. The new dataset is shown to be significantly more challenging for both baselines according to all considered metrics. To investigate the individual and combined effects of ambient noise, interferers, and reverberation, we study the performance of the baseline on different versions of the dataset excluding or including combinations of these factors. The results indicate that by far the most detrimental effects are caused by directional interferers.
翻译:本报告介绍了DCASE2021 " 健康事件定位和探测挑战 " (SELD)任务3的数据集和基线。数据集的基础是模拟真实记录在真实回响和环境噪音条件下发生的静态或移动声音事件,使用在各种房间捕捉的空间室冲动反应反应,并以两种空间格式提供。声学合成与以往挑战的迭代相同,然而,新的数据集为同一类的多功能和重叠案例带来了更具有挑战性的条件。新数据集的最重要区别是引入方向干扰器,这意味着在空间中出现但不属于要检测的目标类别但不包含目标类别的任何声音事件。由于在SELD的每一个现实世界情景中都预期会发生这种干扰事件,因此新的数据集的目的是促进有效处理这一状况的系统。一个经过修改的SELDnet基线,使用最近对 SELDA 问题的表示与数据集相匹配,并显示它超越了前一个数据集。新的数据集显示,在空间中的位置是相当危险的,但不属于要检测的目标类别,而且没有附加说明是哪些目标性的事件。由于我们所考虑的单个基线和图像的组合,因此,我们对这些基准和模型的合并结果进行了研究。