FPGA上有线有线新闻网加速器的设计 (Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA) - 专知论文

会员服务 ·

0

模型评估 · CNN · FPGA · 可约的 · 设计 ·

2022 年 8 月 9 日

Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA

翻译：FPGA上有线有线新闻网加速器的设计

Cecilia Latotzke,Tim Ciesielski,Tobias Gemmeke

from arxiv, 32nd International Conference on Field Programmable Logic and Applications (FPL 2022)

Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9x and 9.4x compared to the respective floating-point baseline.

翻译：在这项工作中,我们提出了一个深入的量化方法,以有效探索设计空间,同时考虑到某个特定FPGA的有限硬件资源。我们的整体探索方法垂直穿透了从建筑到逻辑水平的各种设计进入水平,并横向覆盖了从处理元素到数据流的优化,以便实现高效混合精度CNN加速器的数据流。我们由此产生的硬件加速器实施了真正混合精度操作,从而使得能够高效率地执行分层和通道偏斜的CNN。我们提出了一种深入的定量方法,以有效探索设计空间,同时考虑到某个特定FPGA的有限硬件资源。我们的整体探索方法垂直穿透了从建筑向下到逻辑水平的各种设计进入水平,并同时覆盖了从处理元素到高效混合精度和(或)光度降低单字长度。我们产生的硬件加速器可以实现更高效的混合精度操作,从而使得能够高效地执行分层和通道偏移的CNN。我们绘制了进和身份短连接混合精度图,导致竞争性的精确度交易:245框架/有87.48%的顶端至5的精确度,以及数据流参数从处理到数据流流流到有效的CN新闻网络的精确度为RS-18和9.9x所需的最高精确度,而需要降为S-9.-9-9-9-9-9号的S-9号的S-9号的S-9号的S-9-9-9-9-9号基底的S-9x。

0

相关内容

模型评估

机器学习系统设计系统评估标准

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

奶牛乳腺脂类合成代谢转录调控机制与基因网络构建

国家自然科学基金

0+阅读 · 2014年12月31日

基于模型的安全关键的信息物理融合系统的设计方法中的软件综合

国家自然科学基金

1+阅读 · 2014年12月31日

BRCA1蛋白出核的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向LDH-A能量代谢对T细胞急性淋巴细胞白血病的抗白血病效应及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

活化的PLC-γ及与Akt关联调控OA软骨基质代谢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-99a调节靶基因PLK1和EIF2C2表达影响膀胱癌生物学行为的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

生殖干细胞专有基因PIWIL2调节miR-10a抑制肿瘤及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非对称小RNA干扰技术靶向沉默USP22基因对膀胱癌生物学行为的调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于片上网络的众核处理器容错设计方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Residual-based error correction for neural operator accelerated infinite-dimensional Bayesian inverse problems

Arxiv

0+阅读 · 2022年10月6日

IR2Net: Information Restriction and Information Recovery for Accurate Binary Neural Networks

Arxiv

0+阅读 · 2022年10月6日

Efficient Sequence Packing without Cross-contamination: Accelerating Large Language Models without Impacting Performance

Arxiv

0+阅读 · 2022年10月5日

Accelerated Training of Physics-Informed Neural Networks (PINNs) using Meshless Discretizations

Arxiv

0+阅读 · 2022年10月5日

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning

Arxiv

0+阅读 · 2022年10月2日

Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation

Arxiv

0+阅读 · 2022年9月30日

AI Accelerator Survey and Trends

Arxiv

28+阅读 · 2021年9月18日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Residual-based error correction for neural operator accelerated infinite-dimensional Bayesian inverse problems

Arxiv

0+阅读 · 2022年10月6日

IR2Net: Information Restriction and Information Recovery for Accurate Binary Neural Networks

Arxiv

0+阅读 · 2022年10月6日

Efficient Sequence Packing without Cross-contamination: Accelerating Large Language Models without Impacting Performance

Arxiv

0+阅读 · 2022年10月5日

Accelerated Training of Physics-Informed Neural Networks (PINNs) using Meshless Discretizations

Arxiv

0+阅读 · 2022年10月5日

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning

Arxiv

0+阅读 · 2022年10月2日

Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation

Arxiv

0+阅读 · 2022年9月30日

AI Accelerator Survey and Trends

Arxiv

28+阅读 · 2021年9月18日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

奶牛乳腺脂类合成代谢转录调控机制与基因网络构建

国家自然科学基金

0+阅读 · 2014年12月31日

基于模型的安全关键的信息物理融合系统的设计方法中的软件综合

国家自然科学基金

1+阅读 · 2014年12月31日

BRCA1蛋白出核的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向LDH-A能量代谢对T细胞急性淋巴细胞白血病的抗白血病效应及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

活化的PLC-γ及与Akt关联调控OA软骨基质代谢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-99a调节靶基因PLK1和EIF2C2表达影响膀胱癌生物学行为的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

生殖干细胞专有基因PIWIL2调节miR-10a抑制肿瘤及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非对称小RNA干扰技术靶向沉默USP22基因对膀胱癌生物学行为的调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于片上网络的众核处理器容错设计方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员