This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset for sound event localization and detection, comprised of spatial recordings of real scenes collected in various interiors of two different sites. The dataset is captured with a high resolution spherical microphone array and delivered in two 4-channel formats, first-order Ambisonics and tetrahedral microphone array. Sound events in the dataset belonging to 13 target sound classes are annotated both temporally and spatially through a combination of human annotation and optical tracking. The dataset serves as the development and evaluation dataset for the Task 3 of the DCASE2022 Challenge on Sound Event Localization and Detection and introduces significant new challenges for the task compared to the previous iterations, which were based on synthetic spatialized sound scene recordings. Dataset specifications are detailed including recording and annotation process, target classes and their presence, and details on the development and evaluation splits. Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format. Results of the baseline indicate that with a suitable training strategy a reasonable detection and localization performance can be achieved on real sound scene recordings. The dataset is available in https://zenodo.org/record/6387880.
翻译:本报告介绍了2022年Sony-Tau现实空间声音场(STARS22)的声控事件定位和探测数据集,包括两个不同地点内部收集的真实场景的空间记录,数据集以高分辨率球形麦克风阵列捕获,并以两个4个频道格式,即一级Ambisonics和四面式麦克风阵列交付。属于13个目标声班的数据集中的声控事件通过人文注解和光学跟踪相结合,从时间和空间角度对数据集进行附加说明。数据集是DCASE2028挑战中第3任务3段的开发和评价数据集,该任务与第3段任务相比,以高分辨率球形麦克风阵列阵列为空间记录阵列,以两个4个频道格式,即A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-R-A-R-A-A-R-A-A-R-A-R-R-R-A-A-SD-S-S-R-A-A-S-A-S-S-SD-R-S-SD-SD-SD-A-R-R-R-SD-SD-SD-SD-SD-S-SD-R-S-SD-R-S-S-S-R-S-R-R-R-S-S-SD-SD-A-A-A-A-A-A-A-A-A-S-A-A-A-A-A-A-A-A-A-A-A-A-A-R-A-A-A-A-A-A-A-R-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A