Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference and serves as on-chip memory storage for DNN weights. However, IMC's functional flexibility limitations and their impact on performance, energy, and area efficiency are not yet fully understood at the system level. To target practical end-to-end IoT applications, IMC arrays must be enclosed in heterogeneous programmable systems, introducing new system-level challenges which we aim at addressing in this work. We present a heterogeneous tightly-coupled clustered architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators. We benchmark the system on a highly heterogeneous workload such as the Bottleneck layer from a MobileNetV2, showing 11.5x performance and 9.5x energy efficiency improvements, compared to highly optimized parallel execution on the cores. Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator. Our results show that our solution, on the end-to-end inference of the MobileNetV2, is one order of magnitude better in terms of execution latency than existing programmable architectures and two orders of magnitude better than state-of-the-art heterogeneous solutions integrating in-memory computing analog cores.
翻译:使用非挥发性内存(NVM)的模拟模拟计算(IMC)将大大提高深神经网络(DNN)的推断效率,并用作DNN重量的在芯片内存储器。然而,IMC的功能灵活性限制及其对性能、能量和地区效率的影响尚未在系统一级完全理解。要将IMC阵列嵌入实用端对端的IOT应用程序,就必须将IMC阵列嵌入可编程系统,引入我们在此工作中要解决的新的系统级命令。我们展示了一个混合的集束结构,将8个RIRC-V核心(DNNNN)结合到芯内存储存储存储器(IMA)和数字加速器。我们用一个高度分散的工作量来衡量系统,例如从移动式NetV2到波特列端端端端,显示现有业绩和9.5x能源效率的改进,与在核心端端端端端解决方案中进行高度优化的同步执行。此外,我们在移动式结构中将一个更精确的IMC-级核心系统要求到升级到升级。