In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT), to address the issue of head degradation in self-attention mechanisms. Our approach replaces the traditional multi-head self-attention mechanism with the Enhanced Multi-Head Attention (EMHA) mechanism, which relaxes the one-to-one mapping constraint among queries and keys, allowing each query to attend to multiple keys. Furthermore, we introduce two interaction models, Inner-Subspace Interaction and Cross-Subspace Interaction, to fully utilize the many-to-many mapping capabilities of EMHA. Extensive experiments on a wide range of tasks (e.g. machine translation, abstractive summarization, grammar correction, language modelling and brain disease automatic diagnosis) show its superiority with a very modest increase in model size.
翻译:在本文中,我们提出了一个新颖的结构,即增强互动变换器(EIT),以解决自我注意机制中头部退化的问题。我们的方法是用增强多头注意机制取代传统的多头自我注意机制,该机制放松了查询和钥匙之间的一对一绘图限制,使每个查询都能够关注多个密钥。此外,我们引入了两种互动模式,即内部-子空间互动和跨子空间互动,以充分利用EMHA的多对多种绘图能力。关于广泛任务(例如机器翻译、抽象组合、语法校正、语言建模和脑疾病自动诊断)的广泛实验显示其优势,模型大小略有增加。