使用多倍重重更新的对数数字系统低精度培训 (Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update)

Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts. One promising approach to reduce the energy costs is representing DNNs with low-precision numbers. While it is common to train DNNs with forward and backward propagation in low-precision, training directly over low-precision weights, without keeping a copy of weights in high-precision, still remains to be an unsolved problem. This is due to complex interactions between learning algorithms and low-precision number systems. To address this, we jointly design a low-precision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam. LNS has a high dynamic range even in a low-bitwidth setting, leading to high energy efficiency and making it relevant for on-board training in energy-constrained edge devices. We design LNS to have the flexibility of choosing different bases for weights and gradients, as they usually require different quantization gaps and dynamic ranges during training. By drawing the connection between LNS and multiplicative update, LNS-Madam ensures low quantization error during weight update, leading to a stable convergence even if the bitwidth is limited. Compared to using a fixed-point or floating-point number system and training with popular learning algorithms such as SGD and Adam, our joint design with LNS and LNS-Madam optimizer achieves better accuracy while requiring smaller bitwidth. Notably, with only 5-bit for gradients, the proposed training framework achieves accuracy comparable to full-precision state-of-the-art models such as ResNet-50 and BERT. After conducting energy estimations by analyzing the math datapath units during training, the results show that our design achieves over 60x energy reduction compared to FP32 on BERT models.

翻译：培训大型深心神经网络目前需要大量能源,从而产生严重的环境影响。为了解决这个问题,我们联合设计了一个低精度培训框架,涉及对数系统(LNS)和多复制性重更新培训方法,称为LNS-MADAM。LNS即使在低位维度环境中也有很高的动态范围,导致高能效,并使得它与高精度设备中的船上培训相关。这是学习算法和低精度数字系统之间的复杂互动造成的。为此,我们联合设计了一个低精度培训框架,包括低精度数字和低精度数字。虽然以低精度和低偏度方式培训DNNNNNNNN,但通常需要不同量的精度准确度和多倍增量更新培训方法。LNS在低位维度环境下直接进行高动态范围培训,从而导致高能效和高精度培训的机载培训。我们设计LNS,选择不同重量和梯度和梯度基础的灵活性,因为它们通常需要不同量的对精度系统进行精度精确度分析,而动态的精度培训则需要比力系统(LNS-N)的精度,在高精度培训期间进行精度更新。