EL-意图:代代人对记忆高效无损注意 (EL-Attention: Memory Efficient Lossless Attention for Generation)

Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks. However, cache brings new memory-related costs and prevents leveraging larger batch size for faster speed. We propose memory-efficient lossless attention (called EL-attention) to address this issue. It avoids heavy operations for building multi-head keys and values, cache for them is not needed. EL-attention constructs an ensemble of attention results by expanding query while keeping key and value shared. It produces the same result as multi-head attention with less GPU memory and faster inference speed. We conduct extensive experiments on Transformer, BART, and GPT-2 for summarization and question generation tasks. The results show EL-attention speeds up existing models by 1.6x to 5.3x without accuracy loss.

翻译：具有多头关注的变换模型需要为高效的生成任务推断而累积中间结果。然而,缓存带来与记忆有关的新成本,防止以更快的速度利用较大批量的规模。我们建议对该问题给予记忆效率高的无损关注(称为EL注意 ) 。它避免了建造多头键和值的繁重操作,不需要为它们缓存。 EL 注意通过扩大查询而使关注结果的组合化,同时保持关键和价值共享。它产生与多头关注相同的结果,而GPU记忆的存储和更快的推断速度。我们在变换器、 BART 和 GPT-2 上进行了广泛的实验, 用于合成和问题生成任务。结果表明, EL 注意加速了现有模型的1.6x至5.3x, 但没有准确损失。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日