人类在常识QA上的对等性:外部注意加强自我意识 (Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention)

Most of today's AI systems focus on using self-attention mechanisms and transformer architectures on large amounts of diverse data to achieve impressive performance gains. In this paper, we propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear. By integrating external information into the prediction process, we hope to reduce the need for ever-larger models and increase the democratization of AI systems. We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems, allowing practitioners to easily customize foundation AI models to many diverse downstream applications. In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities. The proposed system, Knowledgeable External Attention for commonsense Reasoning (KEAR), reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4\% in comparison to the human accuracy of 88.9\%.

翻译：在本文件中,我们提议扩大变压器结构,采用外部关注机制,以吸引外部知识和背景。我们希望通过将外部信息纳入预测过程,减少对不断扩大模型的需求,提高AI系统的民主化程度。我们发现,拟议的外部关注机制可以大大改善现有AI系统的绩效,使从业人员能够方便地将AI模型用于多种不同的下游应用。特别是,我们侧重于常识理性的任务,表明拟议的外部关注机制可以增强现有的变压器模型,并大大改进模型推理能力。拟议的系统“常识理性可知识外部关注”(KEAR)在公开常识-QA研究基准上实现了人类对等,与88.9的人类精确度相比,其精确度为89.4 ⁇ 。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

专知会员服务

19+阅读 · 2022年3月4日

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日