培训关注模式动态动态 (On the Dynamics of Training Attention Models)

The attention mechanism has been widely used in deep neural networks as a model component. By now, it has become a critical building block in many state-of-the-art natural language models. Despite its great success established empirically, the working mechanism of attention has not been investigated at a sufficient theoretical depth to date. In this paper, we set up a simple text classification task and study the dynamics of training a simple attention-based classification model using gradient descent. In this setting, we show that, for the discriminative words that the model should attend to, a persisting identity exists relating its embedding and the inner product of its key and the query. This allows us to prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier. Experiments are performed, which validate our theoretical analysis and provide further insights.

翻译：关注机制在深层神经网络中被广泛用作模型组成部分,现在它已成为许多最先进的自然语言模型中的关键基石。尽管它取得了巨大的成功,但迄今为止还没有在足够的理论深度上对关注工作机制进行调查。在本文中,我们设置了一个简单的文本分类任务,并研究了培训使用梯度下降的简单关注分类模式的动态。在这个背景下,我们显示,由于模型应该关注的带有歧视性的词句,其嵌入及其关键和查询的内在产品存在一种顽固的特征。这使我们能够证明,当关注产出被线性分类者分类时,培训必须集中到歧视性的词句上。进行了实验,这些实验证实了我们的理论分析并提供了进一步的见解。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日