评估自留成果地物促进无害事件本地化和探测的自留情况 (Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection)

Joint sound event localization and detection (SELD) is an emerging audio signal processing task adding spatial dimensions to acoustic scene analysis and sound event detection. A popular approach to modeling SELD jointly is using convolutional recurrent neural network (CRNN) models, where CNNs learn high-level features from multi-channel audio input and the RNNs learn temporal relationships from these high-level features. However, RNNs have some drawbacks, such as a limited capability to model long temporal dependencies and slow training and inference times due to their sequential processing nature. Recently, a few SELD studies used multi-head self-attention (MHSA), among other innovations in their models. MHSA and the related transformer networks have shown state-of-the-art performance in various domains. While they can model long temporal dependencies, they can also be parallelized efficiently. In this paper, we study in detail the effect of MHSA on the SELD task. Specifically, we examined the effects of replacing the RNN blocks with self-attention layers. We studied the influence of stacking multiple self-attention blocks, using multiple attention heads in each self-attention block, and the effect of position embeddings and layer normalization. Evaluation on the DCASE 2021 SELD (task 3) development data set shows a significant improvement in all employed metrics compared to the baseline CRNN accompanying the task.

翻译：联合声音活动本地化和探测(SELD)是一个新兴的音频信号处理任务,在声学场景分析和声音事件探测中增加了空间维度。最近,一些SELD研究采用多头自省(MHSA)和其他创新模式,对SELD进行联合建模。MHSA和相关变压器网络在不同领域展示了最新水平的性能。虽然它们可以模拟长期的时际依赖性,但它们也可以有效地与这些高水平特征并行。在本文中,我们详细研究MHSA对SELD任务的影响。具体地说,我们研究了用自控层取代RNNE区的影响。最近,一些SELD研究使用了多头自省(MHSA)和其他模型的创新方法。MHSA和相关变压器网络在不同领域展示了最新水平的状态。虽然它们可以模拟长期的时空依赖性关系,但也可以同时进行。我们详细研究MHSA对SA对SL任务的影响。我们研究了用自控层结构取代RNNN(M)的影响。我们研究了多头自控多头自控(MHA)和SEDSEA(SEA)中每个重要的SAL 20级升级的自评分级(SD)的SB级) 的自我定位定位和双级(SD) 20级(SIS级) 的SD级) 的自我定位) 的SD) 。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/