Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction. Previous methods that train DNNs in low-precision typically keep a copy of weights in high-precision during the weight updates. Directly training with low-precision weights leads to accuracy degradation due to complex interactions between the low-precision number systems and the learning algorithms. To address this issue, we develop a co-designed low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update algorithm (Madam). We prove that LNS-Madam results in low quantization error during weight updates, leading to stable performance even if the precision is limited. We further propose a hardware design of LNS-Madam that resolves practical challenges in implementing an efficient datapath for LNS computations. Our implementation effectively reduces energy overhead incurred by LNS-to-integer conversion and partial sum accumulation. Experimental results show that LNS-Madam achieves comparable accuracy to full-precision counterparts with only 8 bits on popular computer vision and natural language tasks. Compared to FP32 and FP8, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively.
翻译:在低精确度中代表深心神经网络(DNNs)的低精确度中代表深心神经网络(DNNs)是一种很有希望的方法,有助于高效加速和减少记忆。以前在低精确度中培训DNNs的方法通常在重量更新期间保持高精确度的重力。在低精确度重量的直接培训导致精度下降,因为低精确度数字系统和学习算法之间的复杂相互作用。为了解决这一问题,我们开发了一个共同设计的低精确度培训框架,称为LNS-MAdam,我们联合设计了一个对数数数系统(LNS)和多复制重量更新算法(MAdam)。我们证明,LNS-MAdam在重量更新期间导致低定量误差,即使精确度有限,也导致性能稳定。我们进一步提议了LNS-Madam硬件设计,解决了在使用高效的LNS计算数据路径方面的实际挑战。我们的实施有效地减少了LNS-内向内置转换和部分合成产生的能源间接费用。我们实验的结果显示,LNS-MADADAM(仅通过用户级和用户对等的BBBI-M)只能分别地将Lim8和直判80和精确度降低到完全的精确度。