Luna: 线性统一括号注意 (Luna: Linear Unified Nested Attention)

The quadratic computational and memory complexities of the Transformer's attention mechanism have limited its scalability for modeling long sequences. In this paper, we propose Luna, a linear unified nested attention mechanism that approximates softmax attention with two nested linear attention functions, yielding only linear (as opposed to quadratic) time and space complexity. Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length. Then, the packed sequence is unpacked using the second attention function. As compared to a more traditional attention mechanism, Luna introduces an additional sequence with a fixed length as input and an additional corresponding output, which allows Luna to perform attention operation linearly, while also storing adequate contextual information. We perform extensive evaluations on three benchmarks of sequence modeling tasks: long-context sequence modeling, neural machine translation and masked language modeling for large-scale pretraining. Competitive or even better experimental results demonstrate both the effectiveness and efficiency of Luna compared to a variety

翻译：变形器注意机制的二次计算和内存复杂性限制了其用于模型制作长序列的缩放性。在本文中,我们提议Luna,这是一个线性统一嵌套式注意机制,它以两个嵌入式线性注意功能接近软体注意线性注意,仅产生线性(相对于四度)时间和空间复杂性。具体地说,在第一个注意功能下,Luna将输入序列包装成一个固定长度的序列。然后,用第二个注意功能对包装序列进行拆解。与较传统的注意机制相比,Luna引入了另一个具有固定长度的附加序列,作为投入和额外的相应输出,使Luna能够进行线性注意操作,同时储存适当的背景信息。我们对三个测序任务基准进行了广泛的评价:长方位测序模型、神经机翻译和大型训练前的蒙面语言模型。竞争性或更好的实验结果显示Luna相对于各种模型的有效性和效率。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

近期必读的5篇顶会CVPR 2020【图神经网络（GNN）】相关论文-Part2

专知会员服务

84+阅读 · 2020年3月17日

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

专知会员服务

116+阅读 · 2020年2月10日