无需2美元内存 (Self-attention Does Not Need $O(n^2)$ Memory) - 专知论文

会员服务 ·

0

可约的 · Extensibility · 注意力机制 · contrastive · SimPLe ·

2021 年 12 月 10 日

Self-attention Does Not Need $O(n^2)$ Memory

翻译：无需2美元内存

Markus N. Rabe,Charles Staats

We present a very simple algorithm for attention that requires $O(1)$ memory with respect to sequence length and an extension to self-attention that requires $O(\log n)$ memory. This is in contrast with the frequently stated belief that self-attention requires $O(n^2)$ memory. While the time complexity is still $O(n^2)$, device memory rather than compute capability is often the limiting factor on modern accelerators. Thus, reducing the memory requirements of attention allows processing of longer sequences than might otherwise be feasible. We provide a practical implementation for accelerators that requires $O(\sqrt{n})$ memory, is numerically stable, and is within a few percent of the runtime of the standard implementation of attention. We also demonstrate how to differentiate the function while remaining memory-efficient. For sequence length 16384, the memory overhead of self-attention is reduced by 59X for inference and by 32X for differentiation.

翻译：我们提出了一个非常简单的注意算法,在序列长度方面需要O(1)美元内存,自我注意的延伸需要O(log n)美元内存。这与经常表示的自留需要O(n)2美元内存的信念形成对照。虽然时间复杂性仍然是$(n)2美元,但设备内存而不是计算能力往往是现代加速器的限制因素。因此,减少对注意的内存要求使得对注意序列的处理比其他可能可行时要长。我们为需要O(sqrt{n})内存的加速器提供实际操作,因为需要O(sqrt{n)美元内存,数字稳定,在标准关注执行的运行时间的一小部分之内。我们还演示了如何在保持记忆效率的同时区分功能。对于第16384号序列,自留的内存管理费减少59X,用于推断的减少32X。

0

相关内容

可约的

【ICML2021】具有线性复杂度的Transformer的相对位置编码

【ICML2021】具有线性复杂度的Transformer的相对位置编码

专知会员服务

25+阅读 · 2021年5月20日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

85+阅读 · 2020年8月22日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

已删除

清华大学研究生教育

3+阅读 · 2018年6月30日

干货 | NLP中的self-attention【自-注意力】机制

干货 | NLP中的self-attention【自-注意力】机制

机器学习算法与Python学习

12+阅读 · 2018年4月11日

自然语言处理中的自注意力机制（Self-Attention Mechanism）

自然语言处理中的自注意力机制（Self-Attention Mechanism）

PaperWeekly

22+阅读 · 2018年3月28日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

The chromatic number of triangle-free hypergraphs

Arxiv

0+阅读 · 2022年2月13日

Lower Bounds for Unambiguous Automata via Communication Complexity

Arxiv

0+阅读 · 2022年2月12日

Fast Monte-Carlo Approximation of the Attention Mechanism

Arxiv

8+阅读 · 2022年1月30日

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Arxiv

3+阅读 · 2019年12月1日

What Does BERT Look At? An Analysis of BERT's Attention

Arxiv

4+阅读 · 2019年6月11日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

The Evolved Transformer

The Evolved Transformer

Arxiv

5+阅读 · 2019年1月30日

You May Not Need Attention

Arxiv

4+阅读 · 2018年10月31日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【ICML2021】具有线性复杂度的Transformer的相对位置编码

【ICML2021】具有线性复杂度的Transformer的相对位置编码

专知会员服务

25+阅读 · 2021年5月20日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

85+阅读 · 2020年8月22日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美陆军16型反无人技术装备：最新详情

《多层非金属靶板的预测性分析终端弹道建模：开发、实验验证与不确定性量化》238页

《低成本动能拦截型反无人机系统效能研究》最新171页

中文资讯 | 土耳其阿塞尔桑公司推出“防卫者”100/25 SB无人机杀手，重新定义硬杀伤战术防空系统

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

已删除

清华大学研究生教育

3+阅读 · 2018年6月30日

干货 | NLP中的self-attention【自-注意力】机制

干货 | NLP中的self-attention【自-注意力】机制

机器学习算法与Python学习

12+阅读 · 2018年4月11日

自然语言处理中的自注意力机制（Self-Attention Mechanism）

自然语言处理中的自注意力机制（Self-Attention Mechanism）

PaperWeekly

22+阅读 · 2018年3月28日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

The chromatic number of triangle-free hypergraphs

Arxiv

0+阅读 · 2022年2月13日

Lower Bounds for Unambiguous Automata via Communication Complexity

Arxiv

0+阅读 · 2022年2月12日

Fast Monte-Carlo Approximation of the Attention Mechanism

Arxiv

8+阅读 · 2022年1月30日

Not All Attention Is Needed: Gated Attention Network for Sequence Data

Arxiv

3+阅读 · 2019年12月1日

What Does BERT Look At? An Analysis of BERT's Attention

Arxiv

4+阅读 · 2019年6月11日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

The Evolved Transformer

The Evolved Transformer

Arxiv

5+阅读 · 2019年1月30日

You May Not Need Attention

Arxiv

4+阅读 · 2018年10月31日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员