Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, leading to serious environmental impacts. One promising approach to reduce the energy costs is representing DNNs with low-precision numbers. While it is common to train DNNs with forward and backward propagation in low-precision, training directly over low-precision weights, without keeping a copy of weights in high-precision, still remains to be an unsolved problem. This is due to complex interactions between learning algorithms and low-precision number systems. To address this, we jointly design a low-precision training framework involving a logarithmic number system (LNS) and a multiplicative weight update training method, termed LNS-Madam. LNS has a high dynamic range even in a low-bitwidth setting, leading to high energy efficiency and making it relevant for on-board training in energy-constrained edge devices. We design LNS to have the flexibility of choosing different bases for weights and gradients, as they usually require different quantization gaps and dynamic ranges during training. By drawing the connection between LNS and multiplicative update, LNS-Madam ensures low quantization error during weight update, leading to a stable convergence even if the bitwidth is limited. Compared to using a fixed-point or floating-point number system and training with popular learning algorithms such as SGD and Adam, our joint design with LNS and LNS-Madam optimizer achieves better accuracy while requiring smaller bitwidth. Notably, with only 5-bit for gradients, the proposed training framework achieves accuracy comparable to full-precision state-of-the-art models such as ResNet-50 and BERT. After conducting energy estimations by analyzing the math datapath units during training, the results show that our design achieves over 60x energy reduction compared to FP32 on BERT models.


翻译:培训大型深心神经网络目前需要大量能源,从而产生严重的环境影响。为了解决这个问题,我们联合设计了一个低精度培训框架,涉及对数系统(LNS)和多复制性重更新培训方法,称为LNS-MADAM。LNS即使在低位维度环境中也有很高的动态范围,导致高能效,并使得它与高精度设备中的船上培训相关。这是学习算法和低精度数字系统之间的复杂互动造成的。为此,我们联合设计了一个低精度培训框架,包括低精度数字和低精度数字。虽然以低精度和低偏度方式培训DNNNNNNNN,但通常需要不同量的精度准确度和多倍增量更新培训方法。LNS在低位维度环境下直接进行高动态范围培训,从而导致高能效和高精度培训的机载培训。我们设计LNS,选择不同重量和梯度和梯度基础的灵活性,因为它们通常需要不同量的对精度系统进行精度精确度分析,而动态的精度培训则需要比力系统(LNS-N)的精度,在高精度培训期间进行精度更新。

0
下载
关闭预览

相关内容

【Cell】神经算法推理,Neural algorithmic reasoning
专知会员服务
27+阅读 · 2021年7月16日
【Google】平滑对抗训练,Smooth Adversarial Training
专知会员服务
47+阅读 · 2020年7月4日
Transformer文本分类代码
专知会员服务
116+阅读 · 2020年2月3日
[综述]深度学习下的场景文本检测与识别
专知会员服务
77+阅读 · 2019年10月10日
机器学习入门的经验与建议
专知会员服务
91+阅读 · 2019年10月10日
最新BERT相关论文清单,BERT-related Papers
专知会员服务
52+阅读 · 2019年9月29日
已删除
AI科技评论
4+阅读 · 2018年8月12日
【学习】Hierarchical Softmax
机器学习研究会
4+阅读 · 2017年8月6日
Arxiv
7+阅读 · 2018年6月19日
VIP会员
相关资讯
已删除
AI科技评论
4+阅读 · 2018年8月12日
【学习】Hierarchical Softmax
机器学习研究会
4+阅读 · 2017年8月6日
Top
微信扫码咨询专知VIP会员