Computing-in-memory (CIM) has attracted significant attentions in recent years due to its massive parallelism and low power consumption. However, current CIM designs suffer from large area overhead of small CIM macros and bad programmablity for model execution. This paper proposes a programmable CIM processor with a single large sized CIM macro instead of multiple smaller ones for power efficient computation and a flexible instruction set to support various binary 1-D convolution Neural Network (CNN) models in an easy way. Furthermore, the proposed architecture adopts the pooling write-back method to support fused or independent convolution/pooling operations to reduce 35.9\% of latency, and the flexible ping-pong feature SRAM to fit different feature map sizes during layer-by-layer execution.The design fabricated in TSMC 28nm technology achieves 150.8 GOPS throughput and 885.86 TOPS/W power efficiency at 10 MHz when executing our binary keyword spotting model, which has higher power efficiency and flexibility than previous designs.
翻译:近年来,由于大规模平行和低电能消耗,计算机中模拟(CIM)在最近几年引起了人们的极大关注。然而,目前的CIM设计由于小型CIM宏和用于模型执行的不良程序程序性差而受到影响。本文件提议了一个可编程的CIM处理器,其尺寸为单一的大型CIM宏,而不是用于高效电算的多个较小宏,以及一个灵活的指示装置,以方便的方式支持各种二进制 1D 神经网络模型。此外,拟议的结构采用集合回写法支持引信或独立演动/集中操作,以减少35.9 ⁇ 的延缓度,并采用灵活的海绵特征SRM,以适应逐层执行过程中的不同地貌地图大小。TSMC 28nm技术的设计在10Mz执行我们的二进制关键词识别模型时,实现了150.8GOPS通过量和885.86TOPS/W功率效率,该模型的功率和性能均高于以前的设计。