The training for deep neural networks (DNNs) demands immense energy consumption, which restricts the development of deep learning as well as increases carbon emissions. Thus, the study of energy-efficient training for DNNs is essential. In training, the linear layers consume the most energy because of the intense use of energy-consuming full-precision (FP32) multiplication in multiply-accumulate (MAC). The energy-efficient works try to decrease the precision of multiplication or replace the multiplication with energy-efficient operations such as addition or bitwise shift, to reduce the energy consumption of FP32 multiplications. However, the existing energy-efficient works cannot replace all of the FP32 multiplications during both forward and backward propagation with low-precision energy-efficient operations. In this work, we propose an Adaptive Layer-wise Scaling PoT Quantization (ALS-POTQ) method and a Multiplication-Free MAC (MF-MAC) to replace all of the FP32 multiplications with the INT4 additions and 1-bit XOR operations. In addition, we propose Weight Bias Correction and Parameterized Ratio Clipping techniques for stable training and improving accuracy. In our training scheme, all of the above methods do not introduce extra multiplications, so we reduce up to 95.8% of the energy consumption in linear layers during training. Experimentally, we achieve an accuracy degradation of less than 1% for CNN models on ImageNet and Transformer model on the WMT En-De task. In summary, we significantly outperform the existing methods for both energy efficiency and accuracy.
翻译:深神经网络(DNNS)培训要求巨大的能源消耗,这限制了深度学习的发展,也增加了碳排放。因此,研究DNNS的节能培训至关重要。在培训中,线性层消耗的能源最多,因为大量使用耗能全面精度(FP32)乘以乘积(MAC)倍增。节能工程试图降低倍增的精确度,或用节能操作(如增加或微调转换)取代倍增,以降低FP32倍增的能源消耗。然而,现有的节能工程无法用低精度节能操作取代前向和后传播期间所有FP32的精度倍增。在这项工作中,我们建议采用调制层增精度(PF32)乘法和无倍增法(MMAC)来取代所有FP32的倍增益,以增益方式减少FT32增益量的能源消耗量。此外,我们提议在前向和后传播期间采用WIL32的精确度增益(WIBI) 和后再培训方法,在比标准化(BIBIBMT)的精度和SL)的精度方法中大幅度地改进了。</s>