声音事件探测变异器:以事件为基础的最后到最后检测声音事件模型 (Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection)

Sound event detection (SED) has gained increasing attention with its wide application in surveillance, video indexing, etc. Existing models in SED mainly generate frame-level prediction, converting it into a sequence multi-label classification problem. A critical issue with the frame-based model is that it pursues the best frame-level prediction rather than the best event-level prediction. Besides, it needs post-processing and cannot be trained in an end-to-end way. This paper firstly presents the one-dimensional Detection Transformer (1D-DETR), inspired by Detection Transformer for image object detection. Furthermore, given the characteristics of SED, the audio query branch and a one-to-many matching strategy for fine-tuning the model are added to 1D-DETR to form Sound Event Detection Transformer (SEDT). To our knowledge, SEDT is the first event-based and end-to-end SED model. Experiments are conducted on the URBAN-SED dataset and the DCASE2019 Task4 dataset, and both show that SEDT can achieve competitive performance.

翻译：SED的现有模型主要产生框架级预测,将其转化为多标签分类问题。基于框架的模型的一个关键问题是,它追求最佳框架级预测,而不是最佳事件级预测。此外,它需要后处理,无法接受端到端方式的培训。本文首先展示了在图像物体探测的探测变异器的启发下,在图像物体探测的探测变异器下产生的单维检测变异器(1D-DETR)。此外,鉴于SEDD的特性,音频查询分支和微调模型的一对一匹配战略被添加到 1D-DETR 中,以形成音频事件探测变异器(SEDT) 。据我们所知,SEDDT是第一个基于事件和端到端SED的模型。对URBAN-SED数据集和DCASE2019任务4数据集进行了实验,并且都表明SIDT能够取得竞争性的性能。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/