深神经网络中完全连接层的低长期 CMOS 硬件加速器 (Low Latency CMOS Hardware Acceleration for Fully Connected Layers in Deep Neural Networks) - 专知论文

会员服务 ·

0

层 · Neural Networks · Processing（编程语言） · Weight · 全连接层 ·

2020 年 11 月 25 日

Low Latency CMOS Hardware Acceleration for Fully Connected Layers in Deep Neural Networks

翻译：深神经网络中完全连接层的低长期 CMOS 硬件加速器

Nick Iliev,Amit Ranjan Trivedi

We present a novel low latency CMOS hardware accelerator for fully connected (FC) layers in deep neural networks (DNNs). The FC accelerator, FC-ACCL, is based on 128 8x8 or 16x16 processing elements (PEs) for matrix-vector multiplication, and 128 multiply-accumulate (MAC) units integrated with 128 High Bandwidth Memory (HBM) units for storing the pretrained weights. Micro-architectural details for CMOS ASIC implementations are presented and simulated performance is compared to recent hardware accelerators for DNNs for AlexNet and VGG 16. When comparing simulated processing latency for a 4096-1000 FC8 layer, our FC-ACCL is able to achieve 48.4 GOPS (with a 100 MHz clock) which improves on a recent FC8 layer accelerator quoted at 28.8 GOPS with a 150 MHz clock. We have achieved this considerable improvement by fully utilizing the HBM units for storing and reading out column-specific FClayer weights in 1 cycle with a novel colum-row-column schedule, and implementing a maximally parallel datapath for processing these weights with the corresponding MAC and PE units. When up-scaled to 128 16x16 PEs, for 16x16 tiles of weights, the design can reduce latency for the large FC6 layer by 60 % in AlexNet and by 3 % in VGG16 when compared to an alternative EIE solution which uses compression.

翻译：我们为深神经网络(DNN)中完全连接(FC)层提供了新型的低悬浮 CMOS硬件加速器。 FC 加速器(FC-ACCL)基于用于矩阵矢量倍增的128 8x8 或 16x16 处理元件(PE),以及与128个高频宽度内存(HBM)单元结合的128个倍积(MAC)单元,用于储存预先训练重量。CMOS ASC ASIC 执行的微结构设计细节被显示,模拟性能与最近为AlexNet 和 VGG16 的DNC 硬件加速器进行比较。当将4096-1000 FC8 层的模拟处理延迟元件(PE)进行对比时,我们的FC-ACCL能够实现48.4 GOPS(加100 MMz钟) 倍增积积(MAC) 单元,在28.8 GPOPS 和150 MHMHS 时间 16 时,我们通过充分利用 HBS 的替代设备储存和读取 16 CLELELE 等特定的硬度重量,在1级的40级平级平级内,将40级平级平级平级内,将40级的40级的平级的平级计算中,这些压压的平级的平级计算。

0

相关内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

专知会员服务

46+阅读 · 2020年9月28日

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

专知会员服务

26+阅读 · 2020年7月19日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Towards Accelerating Training of Batch Normalization: A Manifold Perspective

Towards Accelerating Training of Batch Normalization: A Manifold Perspective

Arxiv

0+阅读 · 2021年1月8日

Accelerated, Optimal, and Parallel: Some Results on Model-Based Stochastic Optimization

Arxiv

0+阅读 · 2021年1月7日

Low-cost and high-performance data augmentation for deep-learning-based skin lesion classification

Arxiv

0+阅读 · 2021年1月7日

Aerial-Ground Interference Mitigation for Cellular-Connected UAV

Arxiv

0+阅读 · 2021年1月6日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

HAQ: Hardware-Aware Automated Quantization

HAQ: Hardware-Aware Automated Quantization

Arxiv

6+阅读 · 2018年11月21日

A Convolutional Feature Map based Deep Network targeted towards Traffic Detection and Classification

Arxiv

3+阅读 · 2018年5月22日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Dynamic Weight Alignment for Convolutional Neural Networks

Arxiv

6+阅读 · 2018年1月25日

VIP会员

文章信息

相关主题

Neural Networks

Processing（编程语言）

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

专知会员服务

46+阅读 · 2020年9月28日

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

专知会员服务

26+阅读 · 2020年7月19日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

Highway Networks For Sentence Classification

Highway Networks For Sentence Classification

哈工大SCIR

4+阅读 · 2017年9月30日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Towards Accelerating Training of Batch Normalization: A Manifold Perspective

Towards Accelerating Training of Batch Normalization: A Manifold Perspective

Arxiv

0+阅读 · 2021年1月8日

Accelerated, Optimal, and Parallel: Some Results on Model-Based Stochastic Optimization

Arxiv

0+阅读 · 2021年1月7日

Low-cost and high-performance data augmentation for deep-learning-based skin lesion classification

Arxiv

0+阅读 · 2021年1月7日

Aerial-Ground Interference Mitigation for Cellular-Connected UAV

Arxiv

0+阅读 · 2021年1月6日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

HAQ: Hardware-Aware Automated Quantization

HAQ: Hardware-Aware Automated Quantization

Arxiv

6+阅读 · 2018年11月21日

A Convolutional Feature Map based Deep Network targeted towards Traffic Detection and Classification

Arxiv

3+阅读 · 2018年5月22日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Dynamic Weight Alignment for Convolutional Neural Networks

Arxiv

6+阅读 · 2018年1月25日

微信扫码咨询专知VIP会员