并非需要全部注意:序列数据关注网 (Not All Attention Is Needed: Gated Attention Network for Sequence Data)

Although deep neural networks generally have fixed network structures, the concept of dynamic mechanism has drawn more and more attention in recent years. Attention mechanisms compute input-dependent dynamic attention weights for aggregating a sequence of hidden states. Dynamic network configuration in convolutional neural networks (CNNs) selectively activates only part of the network at a time for different inputs. In this paper, we combine the two dynamic mechanisms for text classification tasks. Traditional attention mechanisms attend to the whole sequence of hidden states for an input sentence, while in most cases not all attention is needed especially for long sequences. We propose a novel method called Gated Attention Network (GA-Net) to dynamically select a subset of elements to attend to using an auxiliary network, and compute attention weights to aggregate the selected elements. It avoids a significant amount of unnecessary computation on unattended elements, and allows the model to pay attention to important parts of the sequence. Experiments in various datasets show that the proposed method achieves better performance compared with all baseline models with global or local attention while requiring less computation and achieving better interpretability. It is also promising to extend the idea to more complex attention-based models, such as transformers and seq-to-seq models.

翻译：虽然深神经网络一般都有固定的网络结构,但动态机制的概念近年来已引起越来越多的注意。注意机制计算出一个隐藏状态序列的集成需要投入依赖的动态关注权重。卷发神经网络的动态网络配置在不同的输入时只选择性地激活网络的一部分。在本文中,我们将文本分类任务的两个动态机制结合起来。传统关注机制关注输入句的隐藏状态的整个序列,而在大多数情况下,并非所有的注意都特别需要长序列。我们提议一种叫作GGE-Net的新方法, 以动态方式选择一组元素, 以使用辅助网络, 并计算出集合选定元素的注意权重。它避免了大量不必要的计算, 使模型能够关注顺序的重要部分。各种数据集的实验显示, 与所有基线模型相比, 使用全球或地方的注意度不同, 需要较少的计算和更好的解释性。我们还希望将这一想法扩大到更复杂的关注模型, 诸如变式和变式等变式模型。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【AAAI 2020】双曲图注意力网络，Hyperbolic Graph Attention Network

专知会员服务

94+阅读 · 2020年6月15日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日