WinoCNN: 利用核心共享Winograd Systpolic 阵列加速高效革命神经网络加速在FPGAs上共享Winograd Systpolic 阵列 (WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs) - 专知论文

会员服务 ·

0

Winograd · 核化 · Neural Networks · 卷积 · 优化器 ·

2021 年 7 月 9 日

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

翻译：WinoCNN: 利用核心共享Winograd Systpolic 阵列加速高效革命神经网络加速在FPGAs上共享Winograd Systpolic 阵列

Xinheng Liu,Yao Chen,Cong Hao,Ashutosh Dhar,Deming Chen

from arxiv, Published in the proceedings of ASAP 2021

The combination of Winograd's algorithm and systolic array architecture has demonstrated the capability of improving DSP efficiency in accelerating convolutional neural networks (CNNs) on FPGA platforms. However, handling arbitrary convolution kernel sizes in FPGA-based Winograd processing elements and supporting efficient data access remain underexplored. In this work, we are the first to propose an optimized Winograd processing element (WinoPE), which can naturally support multiple convolution kernel sizes with the same amount of computing resources and maintains high runtime DSP efficiency. Using the proposed WinoPE, we construct a highly efficient systolic array accelerator, termed WinoCNN. We also propose a dedicated memory subsystem to optimize the data access. Based on the accelerator architecture, we build accurate resource and performance modeling to explore optimal accelerator configurations under different resource constraints. We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency. Our implementation achieves DSP efficiency up to 1.33 GOPS/DSP and throughput up to 3.1 TOPS with the Xilinx ZCU102 FPGA. These are 29.1\% and 20.0\% better than the best solutions reported previously, respectively.

翻译：Winograd的算法和同步阵列结构的结合表明,在加速FPGA平台上的进化神经网络(CNNs)方面,能够提高DSP的效率,加快FPGA平台上的进化神经网络(CNNs),然而,在基于FPGA的Winograd处理元素中,处理任意的进化内核内核规模和支持高效的数据访问方面,仍然未得到充分探讨。在这项工作中,我们首先提出一个优化的WinoPED处理元件组(WinoPE)元件组(WinoPE),该元件组可以自然地支持多个进化内核规模,并保持高运行速度的DSP效率。我们利用拟议的WinoPEPE, 建造了一个高效高效的 Systolic 阵列加速器,称为WinoCNN。我们还提议建立一个专门的记忆系统来优化数据访问。根据加速器结构,我们建立了精确的资源和性模型,以探索不同资源制约下的最佳的加速器配置配置。我们在多个FPGPA20和DPSPA上分别报告的最高效率。

0

相关内容

Winograd

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

专知会员服务

26+阅读 · 2020年7月19日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

斯坦福2020硬课《分布式算法与优化》

斯坦福2020硬课《分布式算法与优化》

专知会员服务

123+阅读 · 2020年5月6日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

深度学习模型剪枝：Slimmable Networks三部曲

深度学习模型剪枝：Slimmable Networks三部曲

极市平台

3+阅读 · 2020年2月22日

已删除

将门创投

3+阅读 · 2019年10月18日

AutoML与轻量模型大列表

AutoML与轻量模型大列表

专知

8+阅读 · 2019年4月29日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM

【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM

专知

7+阅读 · 2018年4月3日

商汤联合提出基于FPGA的快速Winograd算法：实现FPGA之上最优的CNN表现与能耗

商汤联合提出基于FPGA的快速Winograd算法：实现FPGA之上最优的CNN表现与能耗

商汤科技

3+阅读 · 2018年2月6日

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

机器学习研究会

5+阅读 · 2017年10月7日

On the Accuracy of Analog Neural Network Inference Accelerators

On the Accuracy of Analog Neural Network Inference Accelerators

Arxiv

0+阅读 · 2021年9月12日

Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

Arxiv

0+阅读 · 2021年9月8日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

Resolution Adaptive Networks for Efficient Inference

Arxiv

5+阅读 · 2020年3月16日

Simplifying Graph Convolutional Networks

Simplifying Graph Convolutional Networks

Arxiv

7+阅读 · 2019年6月20日

Convolutional Self-Attention Network

Arxiv

6+阅读 · 2019年4月8日

CoCoNet: A Collaborative Convolutional Network

CoCoNet: A Collaborative Convolutional Network

Arxiv

6+阅读 · 2019年1月28日

An Attention-Gated Convolutional Neural Network for Sentence Classification

An Attention-Gated Convolutional Neural Network for Sentence Classification

Arxiv

4+阅读 · 2018年12月28日

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Arxiv

4+阅读 · 2018年3月15日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

专知会员服务

26+阅读 · 2020年7月19日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

斯坦福2020硬课《分布式算法与优化》

斯坦福2020硬课《分布式算法与优化》

专知会员服务

123+阅读 · 2020年5月6日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

深度学习模型剪枝：Slimmable Networks三部曲

深度学习模型剪枝：Slimmable Networks三部曲

极市平台

3+阅读 · 2020年2月22日

已删除

将门创投

3+阅读 · 2019年10月18日

AutoML与轻量模型大列表

AutoML与轻量模型大列表

专知

8+阅读 · 2019年4月29日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM

【论文推荐】最新八篇强化学习相关论文—残差网络、QMIX、元学习、动态速率分配、分层强化学习、抽象概况、快速物体检测、SOM

专知

7+阅读 · 2018年4月3日

商汤联合提出基于FPGA的快速Winograd算法：实现FPGA之上最优的CNN表现与能耗

商汤联合提出基于FPGA的快速Winograd算法：实现FPGA之上最优的CNN表现与能耗

商汤科技

3+阅读 · 2018年2月6日

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

【推荐】基于TVM工具链的深度学习编译器 NNVM compiler发布

机器学习研究会

5+阅读 · 2017年10月7日

相关论文

On the Accuracy of Analog Neural Network Inference Accelerators

On the Accuracy of Analog Neural Network Inference Accelerators

Arxiv

0+阅读 · 2021年9月12日

Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

Arxiv

0+阅读 · 2021年9月8日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

Resolution Adaptive Networks for Efficient Inference

Arxiv

5+阅读 · 2020年3月16日

Simplifying Graph Convolutional Networks

Simplifying Graph Convolutional Networks

Arxiv

7+阅读 · 2019年6月20日

Convolutional Self-Attention Network

Arxiv

6+阅读 · 2019年4月8日

CoCoNet: A Collaborative Convolutional Network

CoCoNet: A Collaborative Convolutional Network

Arxiv

6+阅读 · 2019年1月28日

An Attention-Gated Convolutional Neural Network for Sentence Classification

An Attention-Gated Convolutional Neural Network for Sentence Classification

Arxiv

4+阅读 · 2018年12月28日

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Arxiv

4+阅读 · 2018年3月15日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

微信扫码咨询专知VIP会员