培训关注模式动态动态 (On the Dynamics of Training Attention Models)

The attention mechanism has been widely used in deep neural networks as a model component. By now, it has become a critical building block in many state-of-the-art natural language models. Despite its great success established empirically, the working mechanism of attention has not been investigated at a sufficient theoretical depth to date. In this paper, we set up a simple text classification task and study the dynamics of training a simple attention-based classification model using gradient descent. In this setting, we show that, for the discriminative words that the model should attend to, a persisting identity exists relating its embedding and the inner product of its key and the query. This allows us to prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier. Experiments are performed, which validates our theoretical analysis and provides further insights.

翻译：关注机制在深层神经网络中被广泛用作模型组成部分,现在它已成为许多最先进的自然语言模型中的关键基石。尽管它取得了巨大的成功,但迄今为止还没有在足够的理论深度上对关注工作机制进行调查。在本文中,我们设置了一个简单的文本分类任务,并研究了培训使用梯度下降的简单关注分类模式的动态。在这个背景下,我们显示,由于模型应该关注的歧视性词,其嵌入和关键和查询的内在产品存在一种顽固的特征。这使我们能够证明,当关注产出由线性分类师分类时,培训必须集中到歧视性的词汇中。进行了实验,这些实验证实了我们的理论分析并提供了进一步的见解。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

如何画出漂亮BERT模型图？这份10页PPT帮你快速搞定，来自Jimmy Lin

专知会员服务

88+阅读 · 2020年7月22日

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日