Although deep neural networks generally have fixed network structures, the concept of dynamic mechanism has drawn more and more attention in recent years. Attention mechanisms compute input-dependent dynamic attention weights for aggregating a sequence of hidden states. Dynamic network configuration in convolutional neural networks (CNNs) selectively activates only part of the network at a time for different inputs. In this paper, we combine the two dynamic mechanisms for text classification tasks. Traditional attention mechanisms attend to the whole sequence of hidden states for an input sentence, while in most cases not all attention is needed especially for long sequences. We propose a novel method called Gated Attention Network (GA-Net) to dynamically select a subset of elements to attend to using an auxiliary network, and compute attention weights to aggregate the selected elements. It avoids a significant amount of unnecessary computation on unattended elements, and allows the model to pay attention to important parts of the sequence. Experiments in various datasets show that the proposed method achieves better performance compared with all baseline models with global or local attention while requiring less computation and achieving better interpretability. It is also promising to extend the idea to more complex attention-based models, such as transformers and seq-to-seq models.
翻译:虽然深神经网络一般都有固定的网络结构,但动态机制的概念近年来已引起越来越多的注意。注意机制计算出一个隐藏状态序列的集成需要投入依赖的动态关注权重。 卷发神经网络的动态网络配置在不同的输入时只选择性地激活网络的一部分。 在本文中,我们将文本分类任务的两个动态机制结合起来。 传统关注机制关注输入句的隐藏状态的整个序列,而在大多数情况下,并非所有的注意都特别需要长序列。 我们提议一种叫作GGE-Net的新方法, 以动态方式选择一组元素, 以使用辅助网络, 并计算出集合选定元素的注意权重。 它避免了大量不必要的计算, 使模型能够关注顺序的重要部分。 各种数据集的实验显示, 与所有基线模型相比, 使用全球或地方的注意度不同, 需要较少的计算和更好的解释性。 我们还希望将这一想法扩大到更复杂的关注模型, 诸如变式和变式等变式模型。