In-memory computing with resistive crossbar arrays has been suggested to accelerate deep-learning workloads in highly efficient manner. To unleash the full potential of in-memory computing, it is desirable to accelerate the training as well as inference for large deep neural networks (DNNs). In the past, specialized in-memory training algorithms have been proposed that not only accelerate the forward and backward passes, but also establish tricks to update the weight in-memory and in parallel. However, the state-of-the-art algorithm (Tiki-Taka version 2 (TTv2)) still requires near perfect offset correction and suffers from potential biases that might occur due to programming and estimation inaccuracies, as well as longer-term instabilities of the device materials. Here we propose and describe two new and improved algorithms for in-memory computing (Chopped-TTv2 (c-TTv2) and Analog Gradient Accumulation with Dynamic reference (AGAD)), that retain the same runtime complexity but correct for any remaining offsets using choppers. These algorithms greatly relax the device requirements and thus expanding the scope of possible materials potentially employed for such fast in-memory DNN training.
翻译:以阻力跨截截阵列进行模拟计算,建议以高效的方式加快深层学习工作量。为了充分发挥模拟计算的潜力,有必要加快大型深神经网络(DNN)的培训和推断。过去曾提出过专门的模拟培训算法,这些算法不仅加快前向和后向传递,而且还会建立技巧来更新模拟和平行的重量。然而,最先进的算法(Tiki-Taka 版本2 (TTTv2))仍需要近乎完全的抵消性校正,并可能因编程和估计不准确性以及设备材料的长期不稳性而出现偏差。我们在这里提议和描述两种新的和改进的模拟计算法(Chopped-TTv2(c-TTTv2)和具有动态参考的模拟重力加速算法(AGADAD)),这些算法在使用直升机进行的任何剩余补差方面都保持同样的运行复杂性,但正确。这些算法大大放松了可能采用的DNAM设备的要求,从而迅速放宽了可能扩大D培训范围。</s>