Most of today's AI systems focus on using self-attention mechanisms and transformer architectures on large amounts of diverse data to achieve impressive performance gains. In this paper, we propose to augment the transformer architecture with an external attention mechanism to bring external knowledge and context to bear. By integrating external information into the prediction process, we hope to reduce the need for ever-larger models and increase the democratization of AI systems. We find that the proposed external attention mechanism can significantly improve the performance of existing AI systems, allowing practitioners to easily customize foundation AI models to many diverse downstream applications. In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities. The proposed system, Knowledgeable External Attention for commonsense Reasoning (KEAR), reaches human parity on the open CommonsenseQA research benchmark with an accuracy of 89.4\% in comparison to the human accuracy of 88.9\%.
翻译:在本文件中,我们提议扩大变压器结构,采用外部关注机制,以吸引外部知识和背景。我们希望通过将外部信息纳入预测过程,减少对不断扩大模型的需求,提高AI系统的民主化程度。我们发现,拟议的外部关注机制可以大大改善现有AI系统的绩效,使从业人员能够方便地将AI模型用于多种不同的下游应用。特别是,我们侧重于常识理性的任务,表明拟议的外部关注机制可以增强现有的变压器模型,并大大改进模型推理能力。拟议的系统“常识理性可知识外部关注”(KEAR)在公开常识-QA研究基准上实现了人类对等,与88.9的人类精确度相比,其精确度为89.4 ⁇ 。