Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm$^2$ sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS is achieved while consuming just 1.1 mW at 0.4V/154MHz during inference of binary CNNs with up to 7x7 kernels, leading to a peak core energy efficiency of 223 TOPS/W. ChewBaccaNN's flexibility allows to run a much wider range of binary CNNs than other accelerators, drastically improving the accuracy-energy trade-off beyond what can be captured by the TOPS/W metric. In fact, it can perform CIFAR-10 inference at 86.8% accuracy with merely 1.3 $\mu J$, thus exceeding the accuracy while at the same time lowering the energy cost by 2.8x compared to even the most efficient and much larger analog processing-in-memory devices, while keeping the flexibility of running larger CNNs for higher accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class ILSVRC dataset and improves the energy efficiency by 4.4x over accelerators of similar flexibility. Furthermore, it can perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy with only 3.0 mJ/frame -- at an accuracy drop of merely 1.8% from the full-precision ResNet-18.
翻译:智能的内脏网络可以使智能的 IOT 设备变得智能化, 因为它们在保持高网络性能和灵活性的同时, 大大降低了所需的记忆足迹和计算复杂性。 本文展示了 ChewBaccarNNN, 其核心能量效率为0. 7 mm$2$2美元, 由GlobalFuries 22 nm 技术设计的二进制神经网络(CNN)加速器。 通过利用高效的数据再使用、 数据缓冲、 超载存储存储器和电压缩放, 实现了241 GOPS的过量, 同时, 在测试JNC- 1550 Mz 期间, 仅消耗1.1 mW 0.V/ 154MHz, 以0. 0, 0. 0; 在7x 7x 内核内核线网的精度中, 将核心能量效率提高到223 TOPS/W 。 Checkennational 将精度的精度大幅运行, 将精度提高到更低的IPS- 18x 的精度, 同时将精度的精度控制到更精度的精度。