快捷键效果: 从 Tensorflow 到基于 FPGA 的加速器, 用于快捷键数据, 配有可再利用的注意到内存配置 (ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data) - 专知论文

会员服务 ·

0

可约的 · 残差块 · ONCE · EfficientNet · DNN ·

2022 年 2 月 13 日

ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data

翻译：快捷键效果: 从 Tensorflow 到基于 FPGA 的加速器, 用于快捷键数据, 配有可再利用的注意到内存配置

Duy Thanh Nguyen,Hyeonseung Je,Tuan Nghia Nguyen,Soojung Ryu,Kyujoong Lee,Hyuk-Jae Lee

from arxiv, Accepted for publication in IEEE Transaction on Circuits and Systems I: Regular Papers

Residual block is a very common component in recent state-of-the art CNNs such as EfficientNet or EfficientDet. Shortcut data accounts for nearly 40% of feature-maps access in ResNet152 [8]. Most of the previous DNN compilers, accelerators ignore the shortcut data optimization. This paper presents ShortcutFusion, an optimization tool for FPGA-based accelerator with a reuse-aware static memory allocation for shortcut data, to maximize on-chip data reuse given resource constraints. From TensorFlow DNN models, the proposed design generates instruction sets for a group of nodes which uses an optimized data reuse for each residual block. The accelerator design implemented on the Xilinx KCU1500 FPGA card 2.8x faster and 9.9x more power efficient than NVIDIA RTX 2080 Ti for 256x256 input size. . Compared to the result from baseline, in which the weights, inputs, and outputs are accessed from the off-chip memory exactly once per each layer, ShortcutFusion reduces the DRAM access by 47.8-84.8% for RetinaNet, Yolov3, ResNet152, and EfficientNet. Given a similar buffer size to ShortcutMining [8], which also mine the shortcut data in hardware, the proposed work reduces off-chip access for feature-maps 5.27x while accessing weight from off-chip memory exactly once.

翻译：残留区块是最近最先进的CNN 中非常常见的元素。例如, 高效Net 或高效DNN。快捷数据占ResNet152 [8] 中近40%的地貌图访问量。大多数前 DNNN 编译器、加速器忽略了快捷数据优化。本文展示了基于 FPGA 的加速器最优化工具“ 捷径Fusion ”, 该工具为基于 FPGA 的加速器配置了一个可再利用的静态存储器, 用于快捷数据, 以便根据资源限制, 最大限度地再利用芯片数据再利用。从 TensorFlow DNNNM 模型中, 拟议的设计为一组节点生成了指令数据集, 该节点对每个区块使用最优化的权重数据再利用。在 Xilinlinx KCUCU1500 的 FPGA卡 2. 28x 和 9.x 节能比 NVIDIA RTX 2080 Ty 输入大小更高效。。与拟议基线的结果相比, 该基线的重量、投入和产出产出从每层访问一次从硬存储存储存储网络减少5- 服务器。

0

相关内容

可约的

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

如何压缩模型大小，使得深度学习在廉价的嵌入式设备中工作

如何压缩模型大小，使得深度学习在廉价的嵌入式设备中工作

极市平台

0+阅读 · 2021年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】深度学习情感分析综述

【推荐】深度学习情感分析综述

机器学习研究会

58+阅读 · 2018年1月26日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

面向逆时偏移算法的FPGA加速技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

大尺度分布式深度学习框架在隐写分析上的应用

国家自然科学基金

1+阅读 · 2013年12月31日

基于光线追踪机制的三维集成图形处理器体系结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于DR/ICT的BGA封装器件内部缺陷检测技术

国家自然科学基金

0+阅读 · 2012年12月31日

异构GPU集群混合粒度任务协同调度与动态均衡机制研究

国家自然科学基金

2+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高能物理计算的I/O性能优化

国家自然科学基金

1+阅读 · 2012年12月31日

CPU/GPU协同并行计算在第一性原理电子输运模拟中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

基于多核处理器的高性能深度数据包检测技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向嵌入式系统的虚拟化技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

Single-Channel Speech Dereverberation using Subband Network with A Reverberation Time Shortening Target

Arxiv

0+阅读 · 2022年4月19日

M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

Arxiv

0+阅读 · 2022年4月19日

Graph-incorporated Latent Factor Analysis for High-dimensional and Sparse Matrices

Arxiv

0+阅读 · 2022年4月16日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

Kernel similarity matching with Hebbian neural networks

Arxiv

0+阅读 · 2022年4月15日

Fast Sparse Decision Tree Optimization via Reference Ensembles

Arxiv

0+阅读 · 2022年4月14日

A Survey on Green Deep Learning

Arxiv

10+阅读 · 2021年11月10日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

如何压缩模型大小，使得深度学习在廉价的嵌入式设备中工作

如何压缩模型大小，使得深度学习在廉价的嵌入式设备中工作

极市平台

0+阅读 · 2021年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】深度学习情感分析综述

【推荐】深度学习情感分析综述

机器学习研究会

58+阅读 · 2018年1月26日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

CPU- and GPU-based Distributed Sampling in Dirichlet Process Mixtures for Large-scale Analysis

Arxiv

0+阅读 · 2022年4月19日

Single-Channel Speech Dereverberation using Subband Network with A Reverberation Time Shortening Target

Arxiv

0+阅读 · 2022年4月19日

M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

Arxiv

0+阅读 · 2022年4月19日

Graph-incorporated Latent Factor Analysis for High-dimensional and Sparse Matrices

Arxiv

0+阅读 · 2022年4月16日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

Kernel similarity matching with Hebbian neural networks

Arxiv

0+阅读 · 2022年4月15日

Fast Sparse Decision Tree Optimization via Reference Ensembles

Arxiv

0+阅读 · 2022年4月14日

A Survey on Green Deep Learning

Arxiv

10+阅读 · 2021年11月10日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

相关基金

面向逆时偏移算法的FPGA加速技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

大尺度分布式深度学习框架在隐写分析上的应用

国家自然科学基金

1+阅读 · 2013年12月31日

基于光线追踪机制的三维集成图形处理器体系结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于DR/ICT的BGA封装器件内部缺陷检测技术

国家自然科学基金

0+阅读 · 2012年12月31日

异构GPU集群混合粒度任务协同调度与动态均衡机制研究

国家自然科学基金

2+阅读 · 2012年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

高能物理计算的I/O性能优化

国家自然科学基金

1+阅读 · 2012年12月31日

CPU/GPU协同并行计算在第一性原理电子输运模拟中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

基于多核处理器的高性能深度数据包检测技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向嵌入式系统的虚拟化技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员