Compute-In-Memory (CIM) systems, particularly those utilizing ReRAM and memristive technologies, offer a promising path toward energy-efficient neural network computation. However, conventional quantization and compression techniques often fail to fully optimize performance and efficiency in these architectures. In this work, we present a structured quantization method that combines sensitivity analysis with mixed-precision strategies to enhance weight storage and computational performance on ReRAM-based CIM systems. Our approach improves ReRAM Crossbar utilization, significantly reducing power consumption, latency, and computational load, while maintaining high accuracy. Experimental results show 86.33% accuracy at 70% compression, alongside a 40% reduction in power consumption, demonstrating the method's effectiveness for power-constrained applications.
翻译:存内计算(CIM)系统,特别是利用阻变存储器(ReRAM)和忆阻器技术的系统,为实现高能效的神经网络计算提供了一条前景广阔的路径。然而,传统的量化和压缩技术往往无法在这些架构中充分优化性能和效率。本文提出了一种结构化量化方法,该方法将灵敏度分析与混合精度策略相结合,以提升基于ReRAM的CIM系统的权重存储和计算性能。我们的方法提高了ReRAM交叉开关阵列的利用率,在保持高精度的同时,显著降低了功耗、延迟和计算负载。实验结果表明,在70%的压缩率下实现了86.33%的准确率,同时功耗降低了40%,证明了该方法在功耗受限应用中的有效性。