Realizing today's cloud-level artificial intelligence functionalities directly on devices distributed at the edge of the internet calls for edge hardware capable of processing multiple modalities of sensory data (e.g. video, audio) at unprecedented energy-efficiency. AI hardware architectures today cannot meet the demand due to a fundamental "memory wall": data movement between separate compute and memory units consumes large energy and incurs long latency. Resistive random-access memory (RRAM) based compute-in-memory (CIM) architectures promise to bring orders of magnitude energy-efficiency improvement by performing computation directly within memory. However, conventional approaches to CIM hardware design limit its functional flexibility necessary for processing diverse AI workloads, and must overcome hardware imperfections that degrade inference accuracy. Such trade-offs between efficiency, versatility and accuracy cannot be addressed by isolated improvements on any single level of the design. By co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM - the first multimodal edge AI chip using RRAM CIM to simultaneously deliver a high degree of versatility for diverse model architectures, record energy-efficiency $5\times$ - $8\times$ better than prior art across various computational bit-precisions, and inference accuracy comparable to software models with 4-bit weights on all measured standard AI benchmarks including accuracy of 99.0% on MNIST and 85.7% on CIFAR-10 image classification, 84.7% accuracy on Google speech command recognition, and a 70% reduction in image reconstruction error on a Bayesian image recovery task. This work paves a way towards building highly efficient and reconfigurable edge AI hardware platforms for the more demanding and heterogeneous AI applications of the future.
翻译:直接在互联网边缘分布的装置上实现今天的云层层人工智能功能, 需要先进的硬件, 能够以前所未有的能源效率处理多种模式的感官数据( 如视频、音频) 。 今天, AI 硬件结构无法满足需求, 原因是基本的“ 模拟墙 ” : 不同的计算和记忆单位之间的数据移动消耗大量能量, 并造成长期悬浮。 以计算即时( CIM) 结构为基础的随机访问记忆( RRAM) 有望通过直接在记忆中进行计算, 从而带来能源效益的提高。 然而, 常规的CIM 硬件设计限制了它处理多种感官数据( 如视频、音频) 的功能灵活性, 并且必须克服降低精确度的硬件缺陷。 这种效率、 多功能和准确度之间的交换无法通过孤立地改进设计水平。 通过将设计从算法和架构到电路路段和装置的所有高度结构, 我们介绍NeurRRAM - 第一次多式联运的AI, 使用 CRAIM 和 IM IM, 用于处理各种AI 的高级智能精度,, IM 的精确度,, 并且同时提供高精度的智能 IM, IM,, 和 度的智能 和 高级的智能 直径直径直径直径直路段,,,, 直径直径直路段 直路段,, 直路段 直路段,,, 直路段 直路段 直路段,,, 直路段 直路段, 直路段, 直路段 直路段 直路段, 直路段, 直路段,, 直路段,,,,,, 直路段 直路段,,,,,,,,, 直路段, 直路段 直路段 直路段 直路段 直路段,,,, 直路段 直路段 直路段 直路段 直路段 直路段 直路段 直路段 直路段