Analog in-memory computing (AIMC) cores offers significant performance and energy benefits for neural network inference with respect to digital logic (e.g., CPUs). AIMCs accelerate matrix-vector multiplications, which dominate these applications' run-time. However, AIMC-centric platforms lack the flexibility of general-purpose systems, as they often have hard-coded data flows and can only support a limited set of processing functions. With the goal of bridging this gap in flexibility, we present a novel system architecture that tightly integrates analog in-memory computing accelerators into multi-core CPUs in general-purpose systems. We developed a powerful gem5-based full system-level simulation framework into the gem5-X simulator, ALPINE, which enables an in-depth characterization of the proposed architecture. ALPINE allows the simulation of the entire computer architecture stack from major hardware components to their interactions with the Linux OS. Within ALPINE, we have defined a custom ISA extension and a software library to facilitate the deployment of inference models. We showcase and analyze a variety of mappings of different neural network types, and demonstrate up to 20.5x/20.8x performance/energy gains with respect to a SIMD-enabled ARM CPU implementation for convolutional neural networks, multi-layer perceptrons, and recurrent neural networks.
翻译:模拟计算机(AIMC)核心为数字逻辑(如CPUs)神经网络神经网络推断提供了显著的性能和能源效益。AIMCs加速矩阵-矢量倍增,这些应用在运行时间中占主导地位。然而,AIMC中心平台缺乏普通用途系统的灵活性,因为它们经常有硬编码数据流,只能支持有限的处理功能。为了缩小灵活性中的这一差距,我们提出了一个新型系统结构,将模拟模拟模拟模拟模拟计算机加速器纳入普通系统多核心CPU。我们开发了一个强大的基于宝石5的全系统级模拟框架,将其纳入Sem5-X模拟器(ALPINE),从而能够深入描述拟议架构的特征。ALPINE允许将整个计算机结构堆从主要硬件组件模拟到与Linux OS的相互作用。在ALPINE内,我们定义了一个定制的ISA扩展和软件库,以便利在常规系统中部署内置的内置模型。我们开发了一个强大的基于宝石5-X模拟的全系统级模拟框架框架框架框架框架框架。我们展示并分析了不同类型的业绩映射系统网络和SIMFIAMAM成果,用于20级网络的模型。