Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency is limited by energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC cost do so by changing DNN weights or by using low-resolution ADCs that reduce output fidelity. These strategies harm DNN accuracy and/or require costly DNN retraining to compensate. To address these issues, we propose the RAELLA architecture. RAELLA adapts the architecture to each DNN; it lowers the resolution of computed analog values by encoding weights to produce near-zero analog values, adaptively slicing weights for each DNN layer, and dynamically slicing inputs through speculation and recovery. Low-resolution analog values allow RAELLA to both use efficient low-resolution ADCs and maintain accuracy without retraining, all while computing with fewer ADC converts. Compared to other low-accuracy-loss PIM accelerators, RAELLA increases energy efficiency by up to 4.9$\times$ and throughput by up to 3.3$\times$. Compared to PIM accelerators that cause accuracy loss and retrain DNNs to recover, RAELLA achieves similar efficiency and throughput without expensive DNN retraining.
翻译:处理器内存(PIM)加速器通过减少昂贵的数据移动和使用电阻性内存(ReRAM)进行高效的模拟计算,有潜力高效运行深度神经网络(DNN)推断。不幸的是,整体PIM加速器效率受到能耗高的模拟至数字转换器(ADC)的限制。此外,现有的降低ADC成本的加速器通过更改DNN权重或使用低分辨率ADC来降低输出保真度。这些策略会损害DNN的准确性和/或需要昂贵的DNN重新训练来进行补偿。为解决这些问题,我们提出了RAELLA架构。RAELLA使体系结构适应每个DNN;通过对计算的模拟值进行编码以生成接近于零的模拟值,为每个DNN层适应性地分割权重和通过猜测和恢复动态分割输入,降低计算的分辨率。低分辨率的模拟值使RAELLA能够使用低分辨率ADC并保持准确性而无需重新训练,同时计算时使用更少的ADC转换器。与其他低准确度损失的PIM加速器相比,RAELLA提高了能源效率高达4.9倍,吞吐量高达3.3倍。与导致准确度损失并重新训练DNN进行恢复的PIM加速器相比,RAELLA在无需昂贵的DNN重新训练的情况下实现了类似的效率和吞吐量。