The rapid growth of deep neural network (DNN) workloads has significantly increased the demand for large-capacity on-chip SRAM in machine learning (ML) applications, with SRAM arrays now occupying a substantial fraction of the total die area. To address the dual challenges of storage density and computation efficiency, this paper proposes an NVM-in-Cache architecture that integrates resistive RAM (RRAM) devices into a conventional 6T-SRAM cell, forming a compact 6T-2R bit-cell. This hybrid cell enables Processing-in-Memory (PIM) mode, which performs massively parallel multiply-and-accumulate (MAC) operations directly on cache power lines while preserving stored cache data. By exploiting the intrinsic properties of the 6T-2R structure, the architecture achieves additional storage capability, high computational throughput without any bit-cell area overhead. Circuit- and array-level simulations in GlobalFoundries 22nm FDSOI technology demonstrate that the proposed design achieves a throughput of 0.4 TOPS and 452.34 TOPS/W. For 128 row-parallel operations, the CIFAR-10 classification is demonstrated by mapping a Resnet-18 neural network, achieving an accuracy of 91.76%. These results highlight the potential of the NVM-in-Cache approach to serve as a scalable, energy-efficient computing method by re-purposing existing 6T SRAM cache architecture for next-generation AI accelerators and general purpose processors.


翻译:深度神经网络工作负载的快速增长显著提升了机器学习应用中对大容量片上SRAM的需求,SRAM阵列目前已占据芯片总面积中相当大的比例。为应对存储密度与计算效率的双重挑战,本文提出一种NVM-in-Cache架构,将阻变存储器器件集成至传统6T-SRAM单元中,形成紧凑的6T-2R位单元。这种混合单元支持存内处理模式,可在保持缓存数据存储状态的同时,直接在缓存电源线上执行大规模并行乘累加运算。通过利用6T-2R结构的固有特性,该架构在无需任何位单元面积开销的情况下,实现了额外的存储能力和高计算吞吐量。基于GlobalFoundries 22nm FDSOI技术的电路级与阵列级仿真表明,所提设计可实现0.4 TOPS的吞吐量与452.34 TOPS/W的能效。在128行并行操作条件下,通过映射Resnet-18神经网络完成了CIFAR-10分类任务,准确率达到91.76%。这些结果凸显了NVM-in-Cache方法通过重构现有6T SRAM缓存架构,为下一代AI加速器与通用处理器提供可扩展、高能效计算方案的潜力。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员