Few-shot audio event detection is a task that detects the occurrence time of a novel sound class given a few examples. In this work, we propose a system based on segment-level metric learning for the DCASE 2022 challenge of few-shot bioacoustic event detection (task 5). We make better utilization of the negative data within each sound class to build the loss function, and use transductive inference to gain better adaptation on the evaluation set. For the input feature, we find the per-channel energy normalization concatenated with delta mel-frequency cepstral coefficients to be the most effective combination. We also introduce new data augmentation and post-processing procedures for this task. Our final system achieves an f-measure of 68.74 on the DCASE task 5 validation set, outperforming the baseline performance of 29.5 by a large margin. Our system is fully open-sourced at https://github.com/haoheliu/DCASE_2022_Task_5.
翻译:微小的音频事件探测是一项任务,它能探测到一个新声音类的发生时间,并举几个例子。在这项工作中,我们建议为DCASE 2022 提供一个基于部分级标准学习的系统,用于对DASE 2022 号的几发生物声学事件探测(Task 5)。 我们更好地利用每个声音类中的负面数据来构建损失功能,并使用传输推导法来更好地适应评价集。关于输入特征,我们发现与 delta mel-频 Cepstral 系数相融合的每道能源正常化是最有效的组合。我们还为此任务采用了新的数据增强和后处理程序。我们的最后系统在DCASE 5 任务5 的确认上取得了68.74 的衡量标准,大大超过29.5 的基线性能。我们的系统在 https://github.com/haheliu/DCASE_2022_Task_5 上完全开放。