关注自由变换器 (An Attention Free Transformer)

We introduce Attention Free Transformer (AFT), an efficient variant of Transformers that eliminates the need for dot product self attention. In an AFT layer, the key and value are first combined with a set of learned position biases, the result of which is multiplied with the query in an element-wise fashion. This new operation has a memory complexity linear w.r.t. both the context size and the dimension of features, making it compatible to both large input and model sizes. We also introduce AFT-local and AFT-conv, two model variants that take advantage of the idea of locality and spatial weight sharing while maintaining global connectivity. We conduct extensive experiments on two autoregressive modeling tasks (CIFAR10 and Enwik8) as well as an image recognition task (ImageNet-1K classification). We show that AFT demonstrates competitive performance on all the benchmarks, while providing excellent efficiency at the same time.

翻译：我们引入了 " 注意自由变换器 " (AFT),这是一个高效的变换器,它消除了对点产品自我关注的需要。在AFT层中,关键值和价值首先与一套学到的定位偏差相结合,其结果以元素方式与查询相乘。这一新操作具有内存复杂性的线性(w.r.t.)和特征的尺寸,使其与大输入和模型大小相容。我们还引入了AFT-local和AFT-conv,这是两个模型变异器,既利用地点和空间重量共享的概念,又保持全球连通性。我们对两个自动递增的模型任务(CIFAR10和Enwik8)以及图像识别任务(IMageNet-1K分类)进行了广泛的实验。我们显示,AFT展示了所有基准的竞争性业绩,同时提供了极佳的效率。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日