匹配密钥和查询分布的对齐注意 (Alignment Attention by Matching Key and Query Distributions)

The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains. Most such models use multi-head self-attention which is appealing for the ability to attend to information from different perspectives. This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head. The resulting alignment attention networks can be optimized as an unsupervised regularization in the existing attention framework. It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention. On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our approach on graph attention and visual question answering, showing the great potential of incorporating our alignment method into various attention-related tasks.

翻译：神经关注机制已被纳入深层神经网络,以在不同领域实现最先进的性能;大多数此类模式都采用多头自省,这要求能够从不同角度处理信息;本文件引入了一致关注,明确鼓励自我关注,以匹配每个头部的钥匙和查询的分布;由此形成的对齐关注网络可以优化为现有关注框架中不受监督的正规化;将任何自我关注模式,包括培训前的对齐关注模式,转换为拟议的对齐关注;在各种语言理解任务方面,我们展示了我们的方法在准确性、不确定性估计、跨领域一般化和对抗性攻击的稳健性方面的有效性;我们进一步展示了我们的方法在笔式关注和直观问题回答方面的普遍适用性,显示了将我们的对齐方法纳入各种与关注有关的任务的巨大潜力。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

东京大学 | TrTr：基于Transformer的目标跟踪

专知会员服务

36+阅读 · 2021年5月12日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日