Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes. Until recently only target sound events were considered when synthesizing the soundscapes. However, recorded soundscapes often contain a substantial amount of non-target events that may affect the performance. In this paper, we focus on the impact of these non-target events in the synthetic soundscapes. Firstly, we investigate to what extent using non-target events alternatively during the training or validation phase (or none of them) helps the system to correctly detect target events. Secondly, we analyze to what extend adjusting the signal-to-noise ratio between target and non-target events at training improves the sound event detection performance. The results show that using both target and non-target events for only one of the phases (validation or training) helps the system to properly detect sound events, outperforming the baseline (which uses non-target events in both phases). The paper also reports the results of a preliminary study on evaluating the system on clips that contain only non-target events. This opens questions for future work on non-target subset and acoustic similarity between target and non-target events which might confuse the system.
翻译:2021年声学探测和分类发现和事件挑战 任务4使用一个包含记录和合成声学场景的多样化数据集。直到最近,在合成声音场景时只考虑目标声音事件,但记录的声音场景往往包含大量可能影响性能的非目标事件。在本文件中,我们侧重于合成声音场景中这些非目标事件的影响。首先,我们调查在培训或验证阶段使用非目标事件在多大程度上有助于系统正确检测目标事件。第二,我们分析在培训中将目标事件和非目标事件之间的信号对噪音比调整到何种程度会提高声音探测业绩。结果显示,只使用一个阶段的目标和非目标事件都有助于系统正确检测声音事件,超过基线(在两个阶段都使用非目标事件)。文件还报告了对仅包含非目标事件片段的系统进行初步评估的结果。这为未来非目标子片段和类似目标事件之间可能存在的迷惑和图像事件打开了问题。