Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes only a third of the training process. Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i.e., the loss gradients with respect to the outputs of intermediate neural layers. In this work, we examine the importance of having unbiased quantization in quantized neural network training, where to maintain it, and how. Based on this, we suggest a \textit{logarithmic unbiased quantization} (LUQ) method to quantize all both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training without overhead. For example, in ResNet50 on ImageNet, we achieved a degradation of 1.1\%. We further improve this to degradation of only 0.32\% after three epochs of high precision fine-tunining, combined with a variance reduction method -- where both these methods add overhead comparable to previously suggested methods.
翻译:重量和激活量的量化是减少深神经网络(DNNs)培训计算足迹的主要方法之一。 目前的方法可以使前阶段的四位数化。 但是,这只是培训过程的三分之一。 降低整个培训过程的计算足迹要求对神经梯度进行量化,即中间神经层产出的损失梯度。 在这项工作中,我们研究了在量化神经网络培训中进行公正量化的重要性,以保持这种量化,以及如何进行。 在此基础上,我们建议采用一种将前向和后向阶段全部量化为四位数的方法,在没有间接费用的情况下实现四位数培训的状态结果。 例如,在图像网络的ResNet50中,我们实现了1.1 ⁇ 的退化。 我们进一步在高精度精度精度精度精度微度三度后,仅将0.32 ⁇ 的降解率提高到0.32°C。 我们建议采用这种方法,将前向前向前向前向后方和前向后方的偏差方法结合起来,同时将上述方法与前向相近的方法合并。