闪电注意:快速和内存-高效实时注意,有IO-意识 (FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness) - 专知论文

会员服务 ·

0

Attention · 可约的 · 确切的 · 近似 · 模型评估 ·

2022 年 6 月 23 日

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

翻译：闪电注意:快速和内存-高效实时注意,有IO-意识

Tri Dao,Daniel Y. Fu,Stefano Ermon,Atri Rudra,Christopher Ré

Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3$\times$ speedup on GPT-2 (seq. length 1K), and 2.4$\times$ speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).

翻译：64 变异器在长序列上是缓慢的,而记忆-饥饿是长序列上是缓慢的,因为自我注意的时间和记忆复杂性在序列长度上是四倍的。近距离关注方法试图通过交换模型质量来解决这一问题,以减少计算复杂性,但往往没有实现倒时钟加速。我们争论说, 缺少的原则是关注算法 IO- 觉 -- 计算读数和写数在 GPU 记忆级别之间的值。我们提议了 FlashAtention, 一种IO- 觉觉察精确的注意算法, 用来降低GPU高带内存(HBMB) 和 GPUT Streal Right SRA 之间的记忆读数( HBM) 。我们分析了 IMO 复杂性, 显示它需要比标准关注范围少的 HBBMBM- 2- TL) 递增速度( K- Trightal- moreal- deal- deal- deal- deal- deal- dreal- dreal lax) 16 K- dreal- dreal- dreal- dreal- dreal- dreal- dreax 和在1 K- dreal- drealdalx 和1 K- dreal- dreald- drealx 和1 Kxx 上, 在1 K- devalx 上, 和1 K- slock- devalxxxx 上, 16- devit- devalx 和1 Kx 和1 Kx 和1x 上, 和1 K- deval-l-l-l-l-l-l-l-l-l-l-l-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 上, 上, 上, 16-l-l-lxxxxxxxxxxxxxxxxxx, 16-lx, 16-l-l-l-l-l-l

1

相关内容

Attention

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

虚拟化光纤-无线融合宽带接入网中资源调度机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

山楂属植物中苹果褪绿叶斑病毒全基因组解析及其致病机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

蛋白质相互作用及结合位点的预测方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

高计数率多通道核辐射探测器读出芯片研究

国家自然科学基金

0+阅读 · 2013年12月31日

洋葱细胞质雄性不育特异性表达基因orf725的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于时频二维训练信息的高谱效多天线TFT-OFDM技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

恶性疟疾致病相关var基因非编码核酸序列的表达调控及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带数字阵列波束形成及三维成像方法

国家自然科学基金

0+阅读 · 2009年12月31日

人偏肺病毒组织嗜性的基因基础的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Self-Supervised Vision Transformers for Malware Detection

Arxiv

0+阅读 · 2022年8月15日

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

Arxiv

0+阅读 · 2022年8月15日

Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

Arxiv

0+阅读 · 2022年8月15日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Poolingformer: Long Document Modeling with Pooling Attention

Arxiv

14+阅读 · 2021年5月10日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

【MIT深度学习课程】深度序列建模，Deep Sequence Modeling

专知会员服务

78+阅读 · 2020年2月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Self-Supervised Vision Transformers for Malware Detection

Arxiv

0+阅读 · 2022年8月15日

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

Arxiv

0+阅读 · 2022年8月15日

Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

Arxiv

0+阅读 · 2022年8月15日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Poolingformer: Long Document Modeling with Pooling Attention

Arxiv

14+阅读 · 2021年5月10日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

虚拟化光纤-无线融合宽带接入网中资源调度机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

山楂属植物中苹果褪绿叶斑病毒全基因组解析及其致病机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

蛋白质相互作用及结合位点的预测方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

高计数率多通道核辐射探测器读出芯片研究

国家自然科学基金

0+阅读 · 2013年12月31日

洋葱细胞质雄性不育特异性表达基因orf725的功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于时频二维训练信息的高谱效多天线TFT-OFDM技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

恶性疟疾致病相关var基因非编码核酸序列的表达调控及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带数字阵列波束形成及三维成像方法

国家自然科学基金

0+阅读 · 2009年12月31日

人偏肺病毒组织嗜性的基因基础的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员