ATTACC 注意层的赤道交瓶点 (ATTACC the Quadratic Bottleneck of Attention Layers) - 专知论文

会员服务 ·

0

state-of-the-art · 注意力机制 · 层 · Better · Backbone ·

2021 年 7 月 13 日

ATTACC the Quadratic Bottleneck of Attention Layers

翻译：ATTACC 注意层的赤道交瓶点

Sheng-Chun Kao,Suvinay Subramanian,Gaurav Agrawal,Tushar Krishna

Attention mechanisms form the backbone of state-of-the-art machine learning models for a variety of tasks. Deploying them on deep neural network (DNN) accelerators, however, is prohibitively challenging especially under long sequences. Operators in attention layers exhibit limited reuse and quadratic growth in memory footprint, leading to severe memory-boundedness. This paper introduces a new attention-tailored dataflow, termed FLAT, which leverages operator fusion, loop-nest optimizations, and interleaved execution. It increases the effective memory bandwidth by efficiently utilizing the high-bandwidth, low-capacity on-chip buffer and thus achieves better run time and compute resource utilization. We term FLAT-compatible accelerators ATTACC. In our evaluation, ATTACC achieves 1.94x and 1.76x speedup and 49% and 42% of energy reduction comparing to state-of-the-art edge and cloud accelerators.

翻译：关注机制是各种任务最先进的机器学习模型的支柱。但是,在深神经网络(DNN)加速器(DNN)加速器上部署它们尤其具有巨大的挑战性。关注层的操作员在记忆足迹上表现出有限的再利用和二次增长,导致严重的记忆束缚。本文介绍了一种新的关注量数据流, 称为FLAT, 利用操作员的聚合、循环内优化和间断执行。它通过有效利用高带宽、低容量的芯片缓冲器来增加有效的记忆带宽, 从而实现更好的运行时间和计算资源的利用。我们称之为FLAT- 兼容加速器ATACT。在我们的评估中, ATACC 实现了1.94x 和 1.76x 速度, 以及 49% 和 42% 的能源减少量, 与最先进的边缘和云加速器相比。

0

相关内容

state-of-the-art

state-of-the-art

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

伯克利ISSCC2021《脑机接口:基础到未来技术》，附93页ppt与视频

伯克利ISSCC2021《脑机接口:基础到未来技术》，附93页ppt与视频

专知会员服务

49+阅读 · 2021年3月26日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

专知会员服务

77+阅读 · 2020年2月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

18+阅读 · 2019年2月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Adversarial Attacks Against Deep Reinforcement Learning Framework in Internet of Vehicles

Arxiv

0+阅读 · 2021年9月16日

Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Arxiv

0+阅读 · 2021年9月16日

Deflecting Adversarial Attacks

Deflecting Adversarial Attacks

Arxiv

8+阅读 · 2020年2月18日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Area Attention

Arxiv

5+阅读 · 2019年5月23日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

5+阅读 · 2019年4月17日

The Evolved Transformer

The Evolved Transformer

Arxiv

5+阅读 · 2019年1月30日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

state-of-the-art

注意力机制

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

伯克利ISSCC2021《脑机接口:基础到未来技术》，附93页ppt与视频

伯克利ISSCC2021《脑机接口:基础到未来技术》，附93页ppt与视频

专知会员服务

49+阅读 · 2021年3月26日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

【经典书】C++解决问题第七版，1074pdf，Problem Solving with C++

专知会员服务

77+阅读 · 2020年2月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

18+阅读 · 2019年2月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

从Seq2seq到Attention模型到Self Attention（二）

从Seq2seq到Attention模型到Self Attention（二）

量化投资与机器学习

23+阅读 · 2018年10月9日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

【音乐】Attention

【音乐】Attention

英语演讲视频每日一推

3+阅读 · 2017年8月22日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Adversarial Attacks Against Deep Reinforcement Learning Framework in Internet of Vehicles

Arxiv

0+阅读 · 2021年9月16日

Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Arxiv

0+阅读 · 2021年9月16日

Deflecting Adversarial Attacks

Deflecting Adversarial Attacks

Arxiv

8+阅读 · 2020年2月18日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Area Attention

Arxiv

5+阅读 · 2019年5月23日

Self-Attention Graph Pooling

Self-Attention Graph Pooling

Arxiv

5+阅读 · 2019年4月17日

The Evolved Transformer

The Evolved Transformer

Arxiv

5+阅读 · 2019年1月30日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Attention Is All You Need

Arxiv

27+阅读 · 2017年12月6日

微信扫码咨询专知VIP会员