Attention机制最早是在视觉图像领域提出来的,但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14],他们在RNN模型上使用了attention机制来进行图像分类。随后,Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中,使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行,他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近,如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

VIP内容

本文提出一种概念简单且非常有效的注意力模块。不同于现有的通道/空域注意力模块,该模块无需额外参数为特征图推导出3D注意力权值。具体来说,本文基于著名的神经科学理论提出优化能量函数以挖掘神经元的重要性。本文进一步针对该能量函数推导出一种快速解析解并表明:该解析解仅需不超过10行代码即可实现。该模块的另一个优势在于:大部分操作均基于所定义的能量函数选择,避免了过多的结构调整。最后,本文在不同的任务上对所提注意力模块的有效性、灵活性进行验证。

本文主要贡献包含以下几点:

  • 受启发于人脑注意力机制,本文提出一种3D注意力模块并设计了一种能量函数用于计算注意力权值;
  • 本文推导出了能量函数的解析解加速了注意力权值的计算并得到了一种轻量型注意力模块;
  • 将所提注意力嵌入到现有ConvNet中在不同任务上进行了灵活性与有效性的验证。
成为VIP会员查看完整内容
0
4

最新论文

Non-autoregressive mechanisms can significantly decrease inference time for speech transformers, especially when the single step variant is applied. Previous work on CTC alignment-based single step non-autoregressive transformer (CASS-NAT) has shown a large real time factor (RTF) improvement over autoregressive transformers (AT). In this work, we propose several methods to improve the accuracy of the end-to-end CASS-NAT, followed by performance analyses. First, convolution augmented self-attention blocks are applied to both the encoder and decoder modules. Second, we propose to expand the trigger mask (acoustic boundary) for each token to increase the robustness of CTC alignments. In addition, iterated loss functions are used to enhance the gradient update of low-layer parameters. Without using an external language model, the WERs of the improved CASS-NAT, when using the three methods, are 3.1%/7.2% on Librispeech test clean/other sets and the CER is 5.4% on the Aishell1 test set, achieving a 7%~21% relative WER/CER improvement. For the analyses, we plot attention weight distributions in the decoders to visualize the relationships between token-level acoustic embeddings. When the acoustic embeddings are visualized, we find that they have a similar behavior to word embeddings, which explains why the improved CASS-NAT performs similarly to AT.

0
0
下载
预览
子主题
Top