The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains. Most such models use multi-head self-attention which is appealing for the ability to attend to information from different perspectives. This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head. The resulting alignment attention networks can be optimized as an unsupervised regularization in the existing attention framework. It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention. On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our approach on graph attention and visual question answering, showing the great potential of incorporating our alignment method into various attention-related tasks.
翻译:神经关注机制已被纳入深层神经网络,以在不同领域实现最先进的性能;大多数此类模式都采用多头自省,这要求能够从不同角度处理信息;本文件引入了一致关注,明确鼓励自我关注,以匹配每个头部的钥匙和查询的分布;由此形成的对齐关注网络可以优化为现有关注框架中不受监督的正规化;将任何自我关注模式,包括培训前的对齐关注模式,转换为拟议的对齐关注;在各种语言理解任务方面,我们展示了我们的方法在准确性、不确定性估计、跨领域一般化和对抗性攻击的稳健性方面的有效性;我们进一步展示了我们的方法在笔式关注和直观问题回答方面的普遍适用性,显示了将我们的对齐方法纳入各种与关注有关的任务的巨大潜力。