MuSLCAT: 用于在原波形上建模有争议音乐模型的多层次多层次革命关注变异器 (MuSLCAT: Multi-Scale Multi-Level Convolutional Attention Transformer for Discriminative Music Modeling on Raw Waveforms)

In this work, we aim to improve the expressive capacity of waveform-based discriminative music networks by modeling both sequential (temporal) and hierarchical information in an efficient end-to-end architecture. We present MuSLCAT, or Multi-scale and Multi-level Convolutional Attention Transformer, a novel architecture for learning robust representations of complex music tags directly from raw waveform recordings. We also introduce a lightweight variant of MuSLCAT called MuSLCAN, short for Multi-scale and Multi-level Convolutional Attention Network. Both MuSLCAT and MuSLCAN model features from multiple scales and levels by integrating a frontend-backend architecture. The frontend targets different frequency ranges while modeling long-range dependencies and multi-level interactions by using two convolutional attention networks with attention-augmented convolution (AAC) blocks. The backend dynamically recalibrates multi-scale and level features extracted from the frontend by incorporating self-attention. The difference between MuSLCAT and MuSLCAN is their backend components. MuSLCAT's backend is a modified version of BERT. While MuSLCAN's is a simple AAC block. We validate the proposed MuSLCAT and MuSLCAN architectures by comparing them to state-of-the-art networks on four benchmark datasets for music tagging and genre recognition. Our experiments show that MuSLCAT and MuSLCAN consistently yield competitive results when compared to state-of-the-art waveform-based models yet require considerably fewer parameters.

翻译：在这项工作中,我们的目标是通过在高效端对端结构中建模顺序(时)和等级信息,提高基于波形的歧视性音乐网络的显性能力。我们介绍了MSLCAT,即多级和多级革命注意力变异器,这是一个用于直接从原始波形录音中学习复杂音乐标记强度的新型结构。我们还引入了MSLCAT称为 MuSLCAN的轻量级变异器,这是多级和多级革命关注网络的简称。MSLCAT和MSLCAN的模型特点都是从多个规模和级别上建模的。MusLCAT的后端-后端结构是不同的频率范围,同时使用两个具有关注度变异变的革命性网络来建模长距离依赖和多级的多级共振动注意力变变异器。MuslCAN的后端阵列和MSLCAR的后端阵列(MSLA-SL)的后端模式是其后端和后端结构的后端。MSLCAT的后端变版本,这是对MSLSLA-SLA-SL的后端网络的变版本, 和SLAWAWAV的后端变版本。M-SLMSLMSLM-SL的后置的后置的后端结构是用于SLM-SLM-SLM-SLM-SLM-SLM的校的变校的变式。M-SLM-SLM-SLM-SLM-SLM-SLM-SL的后端结构。M-SL的后置校。M。M的后置的后置版本,它的后置的后置版本,它的后置的后置的后置版本是用于的后置。M-SLM-SLM-SLM-SLM-SLM-SLM-SLM-SLM-SLM-SLM-SLM-SLM-SLA-SLM-SLM-S-S-S-SLM-C的后置版本,它的后置版本,它的后置版本,它的缩图图图

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

11+阅读 · 2020年4月27日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日