思想来自于视觉机制,是对信息进行抽象的过程。

VIP内容

论文链接:https://www.zhuanzhi.ai/paper/3d04de5c54e6026e7a6090e9b64017d3

Transformer 模型已被广泛应用于自然语言处理、计算机视觉、语音等诸多领域,并且取得了卓越的结果。但对于超长序列输入,Transformer 模型受到了极大的限制,因为其核心组件“自注意力机制”导致计算和记忆复杂度随序列长度呈二次增长。为了限制这种增长,微软亚洲研究院提出了一种新颖的两级注意模式:PoolingFormer,经验证,该机制在 Natural Question、TyDi QA、Arxiv 摘要生成数据集上,都取得了较好的效果。

在自注意力机制中,token 的表征计算可以简述为其视野范围内邻居表征的加权和。一般来说,令牌“看”得越远,性能就越好,但计算复杂度也更高。微软亚洲研究院的研究员们观察到,对于一个 token 的表征,离它最近的邻居更重要,而越远距离的邻居,包含的冗余信息就越多。根据这一观察,研究员们探索了更有效的自注意力机制。

PoolingFormer 将原始的全注意力机制修改为一个两级注意力机制:第一级采用滑动窗口注意力机制,限制每个词只关注近距离的邻居;第二级采用池化注意力机制,采用更大的窗口来增加每个 token 的感受野,同时利用池化操作来压缩键和值向量,以减少要参加注意力运算的令牌数量。这种结合滑动注意力机制和池化注意力机制的多级设计可以显著降低计算成本和内存消耗,同时还能获得优异的模型性能。与原始的注意力机制相比,PoolingFormer 的计算和内存复杂度仅随序列长度线性增加。

成为VIP会员查看完整内容
0
20

最新论文

We consider a secret-sharing model where a dealer distributes the shares of a secret among a set of participants with the constraint that only predetermined subsets of participants must be able to reconstruct the secret by pooling their shares. Our study generalizes Shamir's secret-sharing model in three directions. First, we allow a joint design of the protocols for the creation of the shares and the distribution of the shares, instead of constraining the model to independent designs. Second, instead of assuming that the participants and the dealer have access to information-theoretically secure channels at no cost, we assume that they have access to a public channel and correlated randomness. Third, motivated by a wireless network setting where the correlated randomness is obtained from channel gain measurements, we explore a setting where the dealer is an entity made of multiple sub-dealers. Our main results are inner and outer regions for the achievable secret rates that the dealer and the participants can obtain in this model. To this end, we develop two new achievability techniques, a first one to successively handle reliability and security constraints in a distributed setting, and a second one to reduce a multi-dealer setting to multiple single-user dealer settings. Our results yield the capacity region for threshold access structures when the correlated randomness corresponds to pairwise secret keys shared between each sub-dealer and each participant, and the capacity for the all-or-nothing access structure in the presence of a single dealer and arbitrarily correlated randomness.

0
0
下载
预览
Top