飞机:9.4托普/s FPGA型LSTM加速器 (Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-temporal Sparsity) - 专知论文

会员服务 ·

0

特化 · 长短期记忆网络 · Weight · Networking · Targeted Dropout ·

2021 年 8 月 11 日

Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-temporal Sparsity

翻译：飞机:9.4托普/s FPGA型LSTM加速器

Chang Gao,Tobi Delbruck,Shih-Chii Liu

Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time sequential data such as speech recognition. However, it is difficult to deploy these networks on hardware to achieve high throughput and low latency because the fully-connected structure makes LSTM networks a memory-bounded algorithm. Previous work in LSTM accelerators either exploited weight spatial sparsity or temporal sparsity. In this paper, we present a new accelerator called "Spartus" that exploits spatio-temporal sparsity to achieve ultra-low latency inference. The spatial sparsity was induced using our proposed pruning method called Column-Balanced Targeted Dropout (CBTD) that leads to structured sparse weight matrices benefiting workload balance. It achieved up to 96% weight sparsity with negligible accuracy difference for an LSTM network trained on a TIMIT phone recognition task. To induce temporal sparsity in LSTM, we create the DeltaLSTM by extending the previous DeltaGRU method to the LSTM network. This combined sparsity saves on weight memory access and associated arithmetic operations simultaneously. Spartus was implemented on a Xilinx Zynq-7100 FPGA. The per-sample latency for a single DeltaLSTM layer of 1024 neurons running on Spartus is 1 us. Spartus achieved 9.4 TOp/s effective batch-1 throughput and 1.1 TOp/J energy efficiency, which are respectively 4X and 7X higher than the previous state-of-the-art.

翻译：长期内存(LSTM) 常规网络经常用于包含时间序列数据的任务,如语音识别等。但是,很难将这些网络安装在硬件上,以实现高吞吐量和低悬浮度,因为完全连接的结构使LSTM网络成为内存的算法。 LSTM 加速器以往的工作要么开发了重量空间宽度,要么时间偏移。在本文中,我们展示了一个新的加速器,名为“出入口”,利用时空宽度,实现超低悬浮度推断。空间宽度是用我们提议的“高压定点下降”(CBDTD) 运行方法引发的。LSTM 中, 使LSTM 开发了96%的重量宽度, 精度差异很小。要在LSTM 中引入时间偏缓度, 我们将以前的DAGRUTM方法推广到LSTM 超低悬浮度。 S- 7- IMDA 和 S- AS- Ralental 的S- real- real- report S- reportal AS- reportal a SA 10- real AS- report AS- report S- real ax AS- report S- report S-xxxx 10- reportal ax 和 AS-ral- reportal ax 10- reports- sx AS-ral FFFPS- s10-s- s- s- s- s-xx 和S- s-x-x-x-x-x-x-ral-S-S-S-ral-ral-S-x-s-x-x-s-ral-s-s-s-s-s-s-s-s-x-xxxxxxx-ral-s-ral-s-s-ral-x-x-x-x-x-x-xx-ral-S-s-s-S-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-S-S-s-s-s-x

0

相关内容

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【AAAI2021】MVFNet: 用于高效视频识别的多视角融合网络

专知会员服务

11+阅读 · 2021年2月4日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【O'Reilly TensorFlow Conference 2019】TensorFlow社区公告（TensorFlow community announcements），Google TensorFlow产品总监Kemal El Moujahid

【O'Reilly TensorFlow Conference 2019】TensorFlow社区公告（TensorFlow community announcements），Google TensorFlow产品总监Kemal El Moujahid

专知会员服务

6+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

基于 Keras 用 LSTM 网络做时间序列预测

基于 Keras 用 LSTM 网络做时间序列预测

R语言中文社区

21+阅读 · 2018年8月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Sparsifying Neural Network Connections for Face Recognition

Sparsifying Neural Network Connections for Face Recognition

统计学习与视觉计算组

7+阅读 · 2017年6月10日

Spike-inspired Rank Coding for Fast and Accurate Recurrent Neural Networks

Arxiv

0+阅读 · 2021年10月6日

Powerpropagation: A sparsity inducing weight reparameterisation

Arxiv

0+阅读 · 2021年10月6日

On information rates over a binary input channel

Arxiv

0+阅读 · 2021年10月6日

Intelligent Reflecting Surface Enhanced Multi-UAV NOMA Networks

Arxiv

0+阅读 · 2021年10月5日

Towards efficient end-to-end speech recognition with biologically-inspired neural networks

Arxiv

0+阅读 · 2021年10月4日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Neural Architecture Generator Optimization

Arxiv

6+阅读 · 2020年10月8日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Arxiv

4+阅读 · 2018年11月21日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

VIP会员

文章信息

相关主题

长短期记忆网络

Targeted Dropout

相关VIP内容

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【AAAI2021】MVFNet: 用于高效视频识别的多视角融合网络

专知会员服务

11+阅读 · 2021年2月4日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

【ICML2020】拉普拉斯正则化小样本学习，Laplacian Regularized Few-Shot Learning

专知会员服务

77+阅读 · 2020年6月28日

【MIT】时间序列GAN，Subadditivity of Probability Divergences

专知会员服务

63+阅读 · 2020年3月4日

【O'Reilly TensorFlow Conference 2019】TensorFlow社区公告（TensorFlow community announcements），Google TensorFlow产品总监Kemal El Moujahid

【O'Reilly TensorFlow Conference 2019】TensorFlow社区公告（TensorFlow community announcements），Google TensorFlow产品总监Kemal El Moujahid

专知会员服务

6+阅读 · 2019年11月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

地下战：地下空间的战略博弈

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

基于 Keras 用 LSTM 网络做时间序列预测

基于 Keras 用 LSTM 网络做时间序列预测

R语言中文社区

21+阅读 · 2018年8月6日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Sparsifying Neural Network Connections for Face Recognition

Sparsifying Neural Network Connections for Face Recognition

统计学习与视觉计算组

7+阅读 · 2017年6月10日

相关论文

Spike-inspired Rank Coding for Fast and Accurate Recurrent Neural Networks

Arxiv

0+阅读 · 2021年10月6日

Powerpropagation: A sparsity inducing weight reparameterisation

Arxiv

0+阅读 · 2021年10月6日

On information rates over a binary input channel

Arxiv

0+阅读 · 2021年10月6日

Intelligent Reflecting Surface Enhanced Multi-UAV NOMA Networks

Arxiv

0+阅读 · 2021年10月5日

Towards efficient end-to-end speech recognition with biologically-inspired neural networks

Arxiv

0+阅读 · 2021年10月4日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Neural Architecture Generator Optimization

Arxiv

6+阅读 · 2020年10月8日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Arxiv

4+阅读 · 2018年11月21日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

微信扫码咨询专知VIP会员