The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset. The systems need to be able to correctly detect the sound events present in a recorded audio clip, as well as localize the events in time. This year's task is a follow-up of DCASE 2021 Task 4, with some important novelties. The goal of this paper is to describe and motivate these new additions, and report an analysis of their impact on the baseline system. We introduced three main novelties: the use of external datasets, including recently released strongly annotated clips from Audioset, the possibility of leveraging pre-trained models, and a new energy consumption metric to raise awareness about the ecological impact of training sound events detectors. The results on the baseline system show that leveraging open-source pretrained on AudioSet improves the results significantly in terms of event classification but not in terms of event segmentation.
翻译:声频场景和事件挑战任务4的探测和分类 4 的目的是评估使用多种数据集检测国内环境中健全事件的系统,这些系统需要能够正确检测录音片段中出现的声事件,并及时对事件进行本地化。今年的任务是对DCASE 2021任务4采取后续行动,并有一些重要的新颖之处。本文件的目的是描述和激励这些新添加内容,并报告其对基线系统的影响分析。我们介绍了三个主要新颖之处:使用外部数据集,包括最近发布的来自音频片段的有强烈注释的剪辑,利用预先培训的模型的可能性,以及新的能源消耗指标,以提高对培训声事件探测器的生态影响的认识。基线系统的结果显示,利用对音频网进行预先培训的开放源在事件分类方面大大改进了结果,但没有在事件分解方面。