The demand for computation resources and energy efficiency of Convolutional Neural Networks (CNN) applications requires a new paradigm to overcome the "Memory Wall". Analog In-Memory Computing (AIMC) is a promising paradigm since it performs matrix-vector multiplications, the critical kernel of many ML applications, in-place in the analog domain within memory arrays structured as crossbars of memory cells. However, several factors limit the full exploitation of this technology, including the physical fabrication of the crossbar devices, which constrain the memory capacity of a single array. Multi-AIMC architectures have been proposed to overcome this limitation, but they have been demonstrated only for tiny and custom CNNs or performing some layers off-chip. In this work, we present the full inference of an end-to-end ResNet-18 DNN on a 512-cluster heterogeneous architecture coupling a mix of AIMC cores and digital RISC-V cores, achieving up to 20.2 TOPS. Moreover, we analyze the mapping of the network on the available non-volatile cells, compare it with state-of-the-art models, and derive guidelines for next-generation many-core architectures based on AIMC devices.
翻译:对革命神经网络(CNN)应用的计算资源和能源效率的需求要求一种克服“记忆墙”的新模式。模拟计算机(AIMC)是一个很有希望的范例,因为它执行的是矩阵-矢量倍增,许多ML应用的关键核心,在作为记忆细胞交叉柱的记忆阵列中处于模拟域内。然而,若干因素限制了对这一技术的充分利用,包括跨栏装置的物理制造,这制约了单个阵列的记忆能力。已经提出了多种AIMC结构以克服这一限制,但只为小型和定制的CNN展示了这些结构,或运行了一些离芯层。在这项工作中,我们展示了512个组合结构的终端到终端ResNet-18 DNNN的全面推论,将AIMC核心和数字RISC-V核心混合在一起,达到20.2TOPS。此外,我们分析了现有非动荡细胞网络的制图,将其与基于州际模型的AIMC模型的模型和许多核心模型的模型的模型进行对比。