飞机:9.4托普/s FPGA型LSTM加速器 (Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-Temporal Sparsity) - 专知论文

会员服务 ·

0

特化 · Weight · 长短期记忆网络 · 稀疏权重 · 剪枝 ·

2022 年 3 月 29 日

Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

翻译：飞机:9.4托普/s FPGA型LSTM加速器

Chang Gao,Tobi Delbruck,Shih-Chii Liu

from arxiv, Preprint. Under review

Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this paper proposes a new accelerator called "Spartus" that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new Column-Balanced Targeted Dropout (CBTD) structured pruning method, which produces structured sparse weight matrices for balanced workloads. The pruned networks running on Spartus hardware achieve weight sparsity of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 us. Exploiting spatio-temporal sparsity leads to 46X speedup of Spartus over its theoretical hardware performance to achieve 9.4 TOp/s effective batch-1 throughput and 1.1 TOp/s/W power efficiency.

翻译：长期内存(LSTM) 常规网络经常用于涉及时间序列数据的任务,例如语音识别。与以往的LSTM加速器不同,LSTM 加速器利用空间重量宽度或时间活度宽度,本文提议了一个新的加速器,称为“出入口”,利用时空空间宽度实现超低拉度。空间宽度的导出使用新的“ 列- 列- 平衡放电( CBTD) 结构性下降( CBTD) 调试( CBTD) 结构化调试处理方法,该方法为平衡工作量生成结构化的稀薄重重量矩阵。运行在Asearnings 硬件上产生的重量宽度为96%和94%,在TIMIT 和 Librispeech 数据集中,精度损失微小。为了在LSTMTLTM 方法中引入时,我们将前的德尔GRURU方法扩大到了“46” 和“ DelsTMTLTM ” 的高级存储存储/相关计算操作。当S- 运行时,S- realalalentalalal- sal- sal- supalalalal- supal 实现了“ 10- sal- salalal- salal- sal- salimalal- sal- salalalalimal- supal- supal 10- s- s a 10- sal- supalisalisal- supal- supal- salisalisalisal- a AS AS- sal- salizalisal- a- a 度 AS AS 度 AS AS AS AS AS AS 10- sal- sal- sal- sal- a AS- sal- sal real real realalal realalalal real realalalalalalal realalalalalalalal real real real real AS a AS- sal AS- sal AS- sal realalalalalalalalalalalalalalal AS- sal AS- a Falal AS- sal

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

UC伯克利最新深度学习课程上线，强化学习大牛Sergey Levine授课（B站可看）

专知会员服务

33+阅读 · 2021年3月21日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】RNN最新研究进展综述

【推荐】RNN最新研究进展综述

机器学习研究会

26+阅读 · 2018年1月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

大型核设施三维辐射场快速耦合计算方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

动静载荷作用下基于压缩感知域InSAR时间序列分析监测京津高铁沿线地面沉降

国家自然科学基金

0+阅读 · 2012年12月31日

高能高密度前沿新型窄气隙PRC气体探测器研究

国家自然科学基金

0+阅读 · 2012年12月31日

皮米分辨下石墨烯基锂离子电池电极微结构演化的原位动态表征与嵌/脱锂机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高电子迁移率晶体管毫米波建模和可靠性研究

国家自然科学基金

0+阅读 · 2011年12月31日

超材料微波传感器的理论及实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

铁磁、半金属-超导异质结中电子输运的理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

钛基合金马氏体相变起因的研究

国家自然科学基金

0+阅读 · 2009年12月31日

三维片上网络（3D NoC）关键技术研究

国家自然科学基金

1+阅读 · 2008年12月31日

Estimating Software Reliability Using Size-biased Modelling

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

CobBO: Coordinate Backoff Bayesian Optimization with Two-Stage Kernels

Arxiv

0+阅读 · 2022年4月19日

Duality-based Convex Optimization for Real-time Obstacle Avoidance between Polytopes with Control Barrier Functions

Arxiv

0+阅读 · 2022年4月18日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Arxiv

0+阅读 · 2022年4月18日

One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

Arxiv

0+阅读 · 2022年4月16日

Accelerating Attention through Gradient-Based Learned Runtime Pruning

Arxiv

1+阅读 · 2022年4月15日

An Energy-Efficient and Runtime-Reconfigurable FPGA-Based Accelerator for Robotic Localization Systems

Arxiv

0+阅读 · 2022年4月15日

Ear Wearable (Earable) User Authentication via Acoustic Toothprint

Arxiv

0+阅读 · 2022年4月14日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

VIP会员

文章信息

相关主题

长短期记忆网络

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

UC伯克利最新深度学习课程上线，强化学习大牛Sergey Levine授课（B站可看）

专知会员服务

33+阅读 · 2021年3月21日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】RNN最新研究进展综述

【推荐】RNN最新研究进展综述

机器学习研究会

26+阅读 · 2018年1月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Estimating Software Reliability Using Size-biased Modelling

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

CobBO: Coordinate Backoff Bayesian Optimization with Two-Stage Kernels

Arxiv

0+阅读 · 2022年4月19日

Duality-based Convex Optimization for Real-time Obstacle Avoidance between Polytopes with Control Barrier Functions

Arxiv

0+阅读 · 2022年4月18日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Arxiv

0+阅读 · 2022年4月18日

One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

Arxiv

0+阅读 · 2022年4月16日

Accelerating Attention through Gradient-Based Learned Runtime Pruning

Arxiv

1+阅读 · 2022年4月15日

An Energy-Efficient and Runtime-Reconfigurable FPGA-Based Accelerator for Robotic Localization Systems

Arxiv

0+阅读 · 2022年4月15日

Ear Wearable (Earable) User Authentication via Acoustic Toothprint

Arxiv

0+阅读 · 2022年4月14日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

相关基金

基于GPU的脉冲星宽带观测的相干消色散研究

国家自然科学基金

0+阅读 · 2013年12月31日

大型核设施三维辐射场快速耦合计算方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

动静载荷作用下基于压缩感知域InSAR时间序列分析监测京津高铁沿线地面沉降

国家自然科学基金

0+阅读 · 2012年12月31日

高能高密度前沿新型窄气隙PRC气体探测器研究

国家自然科学基金

0+阅读 · 2012年12月31日

皮米分辨下石墨烯基锂离子电池电极微结构演化的原位动态表征与嵌/脱锂机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

高电子迁移率晶体管毫米波建模和可靠性研究

国家自然科学基金

0+阅读 · 2011年12月31日

超材料微波传感器的理论及实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

铁磁、半金属-超导异质结中电子输运的理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

钛基合金马氏体相变起因的研究

国家自然科学基金

0+阅读 · 2009年12月31日

三维片上网络（3D NoC）关键技术研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员