Transformer是谷歌发表的论文《Attention Is All You Need》提出一种完全基于Attention的翻译架构

知识荟萃

论文列表

原文:

相关论文

开源代码

VIP内容

论文链接:https://www.zhuanzhi.ai/paper/3d04de5c54e6026e7a6090e9b64017d3

Transformer 模型已被广泛应用于自然语言处理、计算机视觉、语音等诸多领域,并且取得了卓越的结果。但对于超长序列输入,Transformer 模型受到了极大的限制,因为其核心组件“自注意力机制”导致计算和记忆复杂度随序列长度呈二次增长。为了限制这种增长,微软亚洲研究院提出了一种新颖的两级注意模式:PoolingFormer,经验证,该机制在 Natural Question、TyDi QA、Arxiv 摘要生成数据集上,都取得了较好的效果。

在自注意力机制中,token 的表征计算可以简述为其视野范围内邻居表征的加权和。一般来说,令牌“看”得越远,性能就越好,但计算复杂度也更高。微软亚洲研究院的研究员们观察到,对于一个 token 的表征,离它最近的邻居更重要,而越远距离的邻居,包含的冗余信息就越多。根据这一观察,研究员们探索了更有效的自注意力机制。

PoolingFormer 将原始的全注意力机制修改为一个两级注意力机制:第一级采用滑动窗口注意力机制,限制每个词只关注近距离的邻居;第二级采用池化注意力机制,采用更大的窗口来增加每个 token 的感受野,同时利用池化操作来压缩键和值向量,以减少要参加注意力运算的令牌数量。这种结合滑动注意力机制和池化注意力机制的多级设计可以显著降低计算成本和内存消耗,同时还能获得优异的模型性能。与原始的注意力机制相比,PoolingFormer 的计算和内存复杂度仅随序列长度线性增加。

成为VIP会员查看完整内容
0
13

最新论文

Since the superiority of Transformer in learning long-term dependency, the sign language Transformer model achieves remarkable progress in Sign Language Recognition (SLR) and Translation (SLT). However, there are several issues with the Transformer that prevent it from better sign language understanding. The first issue is that the self-attention mechanism learns sign video representation in a frame-wise manner, neglecting the temporal semantic structure of sign gestures. Secondly, the attention mechanism with absolute position encoding is direction and distance unaware, thus limiting its ability. To address these issues, we propose a new model architecture, namely PiSLTRc, with two distinctive characteristics: (i) content-aware and position-aware convolution layers. Specifically, we explicitly select relevant features using a novel content-aware neighborhood gathering method. Then we aggregate these features with position-informed temporal convolution layers, thus generating robust neighborhood-enhanced sign representation. (ii) injecting the relative position information to the attention mechanism in the encoder, decoder, and even encoder-decoder cross attention. Compared with the vanilla Transformer model, our model performs consistently better on three large-scale sign language benchmarks: PHOENIX-2014, PHOENIX-2014-T and CSL. Furthermore, extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on translation quality with $+1.6$ BLEU improvements.

0
0
下载
预览
Top