Sound event localisation and detection (SELD) is a problem in the field of automatic listening that aims at the temporal detection and localisation (direction of arrival estimation) of sound events within an audio clip, usually of long duration. Due to the amount of data present in the datasets related to this problem, solutions based on deep learning have positioned themselves at the top of the state of the art. Most solutions are based on 2D representations of the audio (different spectrograms) that are processed by a convolutional-recurrent network. The motivation of this submission is to study the squeeze-excitation technique in the convolutional part of the network and how it improves the performance of the system. This study is based on the one carried out by the same team last year. This year, it has been decided to study how this technique improves each of the datasets (last year only the MIC dataset was studied). This modification shows an improvement in the performance of the system compared to the baseline using MIC dataset.
翻译:声音事件定位和探测(SELD)是自动监听领域的一个问题,其目的在于对音频短片中的声音事件进行时间探测和定位(到达估计方向),通常是长期的。由于与这一问题有关的数据集中的数据数量巨大,基于深层次学习的解决方案已处于最先进的状态。大多数解决方案都基于由动态经常网络处理的音频(不同光谱)2D表示法。提交材料的动机是研究网络卷发部分的挤压探索技术及其如何改进系统性能。这项研究以去年同一团队进行的数据为基础。今年,决定研究该技术如何改进每个数据集(去年只研究了MIC数据集)。这一修改表明,与使用MIC数据集的基线相比,系统性能有所改善。