ChewBachann: 223个软式TOPS/W BNN 加速器 (ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator) - 专知论文

会员服务 ·

0

binary · 模型评估 · Performer · CC · Networking ·

2021 年 2 月 26 日

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

翻译：ChewBachann: 223个软式TOPS/W BNN 加速器

Renzo Andri,Geethan Karunaratne,Lukas Cavigelli,Luca Benini

from arxiv, Accepted at IEEE ISCAS 2021, Daegu, South Korea, 23-26 May 2021

Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm$^2$ sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS is achieved while consuming just 1.1 mW at 0.4V/154MHz during inference of binary CNNs with up to 7x7 kernels, leading to a peak core energy efficiency of 223 TOPS/W. ChewBaccaNN's flexibility allows to run a much wider range of binary CNNs than other accelerators, drastically improving the accuracy-energy trade-off beyond what can be captured by the TOPS/W metric. In fact, it can perform CIFAR-10 inference at 86.8% accuracy with merely 1.3 $\mu J$, thus exceeding the accuracy while at the same time lowering the energy cost by 2.8x compared to even the most efficient and much larger analog processing-in-memory devices, while keeping the flexibility of running larger CNNs for higher accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class ILSVRC dataset and improves the energy efficiency by 4.4x over accelerators of similar flexibility. Furthermore, it can perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy with only 3.0 mJ/frame -- at an accuracy drop of merely 1.8% from the full-precision ResNet-18.

翻译：智能的内脏网络可以使智能的 IOT 设备变得智能化, 因为它们在保持高网络性能和灵活性的同时, 大大降低了所需的记忆足迹和计算复杂性。本文展示了 ChewBaccarNNN, 其核心能量效率为0. 7 mm$2$2美元, 由GlobalFuries 22 nm 技术设计的二进制神经网络(CNN)加速器。通过利用高效的数据再使用、数据缓冲、超载存储存储器和电压缩放, 实现了241 GOPS的过量, 同时, 在测试JNC- 1550 Mz 期间, 仅消耗1.1 mW 0.V/ 154MHz, 以0. 0, 0. 0; 在7x 7x 内核内核线网的精度中, 将核心能量效率提高到223 TOPS/W 。 Checkennational 将精度的精度大幅运行, 将精度提高到更低的IPS- 18x 的精度, 同时将精度的精度控制到更精度的精度。

0

相关内容

binary

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

34+阅读 · 2020年8月14日

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

专知会员服务

26+阅读 · 2020年7月19日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【剑桥大学ICLR2020】卷积条件神经过程，Convolutional Conditional Neural Processes

【剑桥大学ICLR2020】卷积条件神经过程，Convolutional Conditional Neural Processes

专知会员服务

33+阅读 · 2020年1月19日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

CoDR: Computation and Data Reuse Aware CNN Accelerator

Arxiv

0+阅读 · 2021年4月20日

On-FPGA Training with Ultra Memory Reduction: A Low-Precision Tensor Method

Arxiv

0+阅读 · 2021年4月19日

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks

Arxiv

0+阅读 · 2021年4月18日

Power Side-Channel Attacks on BNN Accelerators in Remote FPGAs

Arxiv

0+阅读 · 2021年4月18日

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

Arxiv

0+阅读 · 2021年4月17日

Accelerating Sparse Deep Neural Networks

Arxiv

0+阅读 · 2021年4月16日

Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators

Arxiv

0+阅读 · 2021年4月16日

"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization

Arxiv

0+阅读 · 2021年4月16日

DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures

Arxiv

0+阅读 · 2021年4月16日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

34+阅读 · 2020年8月14日

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

【IJCAJ 2020】多通道神经网络 Multi-Channel Graph Neural Networks

专知会员服务

26+阅读 · 2020年7月19日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【剑桥大学ICLR2020】卷积条件神经过程，Convolutional Conditional Neural Processes

【剑桥大学ICLR2020】卷积条件神经过程，Convolutional Conditional Neural Processes

专知会员服务

33+阅读 · 2020年1月19日

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

如何加速NVIDIA gpu上的训练、推理和ML应用？108页ppt，Accelerating training, inference, and ML applications on NVIDIA GPUs

专知会员服务

61+阅读 · 2019年12月29日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维和高维空间中分析、建模和转换潜在表征

从无人机到数据：揭示边缘计算作为新作战域

可解释人工智能的基础

大规模视觉模型中的基于提示的适应：综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

CoDR: Computation and Data Reuse Aware CNN Accelerator

Arxiv

0+阅读 · 2021年4月20日

On-FPGA Training with Ultra Memory Reduction: A Low-Precision Tensor Method

Arxiv

0+阅读 · 2021年4月19日

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks

Arxiv

0+阅读 · 2021年4月18日

Power Side-Channel Attacks on BNN Accelerators in Remote FPGAs

Arxiv

0+阅读 · 2021年4月18日

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

Arxiv

0+阅读 · 2021年4月17日

Accelerating Sparse Deep Neural Networks

Arxiv

0+阅读 · 2021年4月16日

Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators

Arxiv

0+阅读 · 2021年4月16日

"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization

Arxiv

0+阅读 · 2021年4月16日

DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures

Arxiv

0+阅读 · 2021年4月16日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

微信扫码咨询专知VIP会员