在 FPGA 上通过微调注意和动态管道设计设计一个长长适应性算法- Hardware 联合设计变形器 (A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining) - 专知论文

会员服务 ·

0

Attention · 变换 · 稀疏 · FPGA · state-of-the-art ·

2022 年 8 月 20 日

A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining

翻译：在 FPGA 上通过微调注意和动态管道设计设计一个长长适应性算法- Hardware 联合设计变形器

Hongwu Peng,Shaoyi Huang,Shiyang Chen,Bingbing Li,Tong Geng,Ang Li,Weiwen Jiang,Wujie Wen,Jinbo Bi,Hang Liu,Caiwen Ding

from arxiv, 2022 59th ACM/IEEE Design Automation Conference (DAC)

Transformers are considered one of the most important deep learning models since 2018, in part because it establishes state-of-the-art (SOTA) records and could potentially replace existing Deep Neural Networks (DNNs). Despite the remarkable triumphs, the prolonged turnaround time of Transformer models is a widely recognized roadblock. The variety of sequence lengths imposes additional computing overhead where inputs need to be zero-padded to the maximum sentence length in the batch to accommodate the parallel computing platforms. This paper targets the field-programmable gate array (FPGA) and proposes a coherent sequence length adaptive algorithm-hardware co-design for Transformer acceleration. Particularly, we develop a hardware-friendly sparse attention operator and a length-aware hardware resource scheduling algorithm. The proposed sparse attention operator brings the complexity of attention-based models down to linear complexity and alleviates the off-chip memory traffic. The proposed length-aware resource hardware scheduling algorithm dynamically allocates the hardware resources to fill up the pipeline slots and eliminates bubbles for NLP tasks. Experiments show that our design has very small accuracy loss and has 80.2 $\times$ and 2.6 $\times$ speedup compared to CPU and GPU implementation, and 4 $\times$ higher energy efficiency than state-of-the-art GPU accelerator optimized via CUBLAS GEMM.

翻译：自2018年以来,变异器被认为是最重要的深层次学习模式之一,部分原因是它建立了最先进的(SOTA)记录,并有可能取代现有的深神经网络(DNNs)。尽管取得了显著的胜利,但变异器模型的长期周转时间是一个得到广泛承认的路障。序列长度的繁琐要求额外的计算管理费用,投入必须零加到批量的最大刑期,以容纳平行的计算平台。本文针对的是外地可编程门阵列(FPGA),并提议为变异器加速而采用一致序列长度的适应性算法硬件共同设计。特别是,我们开发了一个方便硬件的分散关注操作器和长度的硬件资源调度算法。拟议的分散注意操作器将关注模型的复杂性降低到线性复杂度,并减轻离子存储器内存储器的流量。拟议的长感知资源硬件调度法动态地分配硬件资源,以填补输油槽,消除NLPP任务的泡沫。实验显示,我们的设计损失非常小,而且比CUPO$和GGPOPER高速度的80.2美元和GPO美元比G美元和GGPUPR4美元。

0

相关内容

Attention

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Massive MIMO 系统中接收端低复杂度检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

INF-γ通过CIITA调控PPARγ转录机制及其在2型糖尿病中意义的探讨

国家自然科学基金

0+阅读 · 2013年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

兼具红外指纹标记和磁性分离功能的Fe3O4/SiO2核-壳结构纳米复合生物探针

国家自然科学基金

0+阅读 · 2012年12月31日

基于tau蛋白代谢通路基因多态性和多模态fMRI的遗忘型轻度认知障碍神经网络机制探讨

国家自然科学基金

0+阅读 · 2012年12月31日

基于多模态磁共振成像2型糖尿病脑病嗅觉与认知障碍的机制探讨与干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

肿瘤抑制因子CYLD调节纹状体PI3K/Akt信号途径的研究

国家自然科学基金

0+阅读 · 2011年12月31日

SiP封装基板用聚酰亚胺树脂结构与性能关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

CIB1对脑缺血半暗带微血管作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Deep Learning of Near Field Beam Focusing in Terahertz Wideband Massive MIMO Systems

Arxiv

0+阅读 · 2022年10月6日

An Efficient Contact Algorithm for Rigid/Deformable Interaction based on the Dual Mortar Method

Arxiv

0+阅读 · 2022年10月6日

Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale

Arxiv

0+阅读 · 2022年10月4日

Masked Supervised Learning for Semantic Segmentation

Arxiv

0+阅读 · 2022年10月3日

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning

Arxiv

0+阅读 · 2022年10月2日

Scalable Tail Latency Estimation for Data Center Networks

Arxiv

0+阅读 · 2022年9月30日

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Arxiv

0+阅读 · 2022年9月30日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Deep Learning of Near Field Beam Focusing in Terahertz Wideband Massive MIMO Systems

Arxiv

0+阅读 · 2022年10月6日

An Efficient Contact Algorithm for Rigid/Deformable Interaction based on the Dual Mortar Method

Arxiv

0+阅读 · 2022年10月6日

Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale

Arxiv

0+阅读 · 2022年10月4日

Masked Supervised Learning for Semantic Segmentation

Arxiv

0+阅读 · 2022年10月3日

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning

Arxiv

0+阅读 · 2022年10月2日

Scalable Tail Latency Estimation for Data Center Networks

Arxiv

0+阅读 · 2022年9月30日

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Arxiv

0+阅读 · 2022年9月30日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

相关基金

Massive MIMO 系统中接收端低复杂度检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

INF-γ通过CIITA调控PPARγ转录机制及其在2型糖尿病中意义的探讨

国家自然科学基金

0+阅读 · 2013年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

兼具红外指纹标记和磁性分离功能的Fe3O4/SiO2核-壳结构纳米复合生物探针

国家自然科学基金

0+阅读 · 2012年12月31日

基于tau蛋白代谢通路基因多态性和多模态fMRI的遗忘型轻度认知障碍神经网络机制探讨

国家自然科学基金

0+阅读 · 2012年12月31日

基于多模态磁共振成像2型糖尿病脑病嗅觉与认知障碍的机制探讨与干预研究

国家自然科学基金

0+阅读 · 2012年12月31日

肿瘤抑制因子CYLD调节纹状体PI3K/Akt信号途径的研究

国家自然科学基金

0+阅读 · 2011年12月31日

SiP封装基板用聚酰亚胺树脂结构与性能关系研究

国家自然科学基金

0+阅读 · 2009年12月31日

CIB1对脑缺血半暗带微血管作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员