面向容错存内计算的比特切片交叉阵列权重变换技术：设计方法与评估框架 (Weight Transformations in Bit-Sliced Crossbar Arrays for Fault Tolerant Computing-in-Memory: Design Techniques and Evaluation Framework)

The deployment of deep neural networks (DNNs) on compute-in-memory (CiM) accelerators offers significant energy savings and speed-up by reducing data movement during inference. However, the reliability of CiM-based systems is challenged by stuck-at faults (SAFs) in memory cells, which corrupt stored weights and lead to accuracy degradation. While closest value mapping (CVM) has been shown to partially mitigate these effects for multibit DNNs deployed on bit-sliced crossbars, its fault tolerance is often insufficient under high SAF rates or for complex tasks. In this work, we propose two training-free weight transformation techniques, sign-flip and bit-flip, that enhance SAF tolerance in multi-bit DNNs deployed on bit-sliced crossbar arrays. Sign-flip operates at the weight-column level by selecting between a weight and its negation, whereas bit-flip provides finer granularity by selectively inverting individual bit slices. Both methods expand the search space for fault-aware mappings, operate synergistically with CVM, and require no retraining or additional memory. To enable scalability, we introduce a look-up-table (LUT)-based framework that accelerates the computation of optimal transformations and supports rapid evaluation across models and fault rates. Extensive experiments on ResNet-18, ResNet-50, and ViT models with CIFAR-100 and ImageNet demonstrate that the proposed techniques recover most of the accuracy lost under SAF injection. Hardware analysis shows that these methods incur negligible overhead, with sign-flip leading to negligible energy, latency, and area cost, and bit-flip providing higher fault resilience with modest overheads. These results establish sign-flip and bit-flip as practical and scalable SAF-mitigation strategies for CiM-based DNN accelerators.

翻译：在存内计算（CiM）加速器上部署深度神经网络（DNN）可通过减少推理过程中的数据移动，显著节省能耗并提升速度。然而，基于CiM的系统可靠性受到存储单元中固定型故障（SAF）的挑战，这些故障会破坏存储的权重并导致精度下降。虽然最邻近值映射（CVM）已被证明能部分缓解多比特DNN部署在比特切片交叉阵列上的此类影响，但在高SAF率或复杂任务下，其容错能力往往不足。本文提出了两种无需训练的权重变换技术——符号翻转与比特翻转，以增强部署在比特切片交叉阵列上的多比特DNN的SAF容错能力。符号翻转在权重列级别进行操作，通过在权重与其负值之间进行选择；而比特翻转则通过选择性反转单个比特切片提供更细粒度的控制。两种方法均扩展了故障感知映射的搜索空间，与CVM协同工作，且无需重新训练或额外存储。为实现可扩展性，我们引入了一种基于查找表（LUT）的框架，可加速最优变换的计算，并支持跨模型与故障率的快速评估。在ResNet-18、ResNet-50和ViT模型上使用CIFAR-100和ImageNet数据集进行的广泛实验表明，所提技术能够恢复SAF注入导致的大部分精度损失。硬件分析显示这些方法引入的开销可忽略不计：符号翻转带来的能耗、延迟和面积成本极低，而比特翻转则以适度的开销提供了更高的故障恢复能力。这些结果确立了符号翻转与比特翻转作为基于CiM的DNN加速器中实用且可扩展的SAF缓解策略。