Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs' training time/energy efficiency. In this paper, we attempt to explore low-precision training from a new perspective as inspired by recent findings in understanding DNN training: we conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training. Specifically, we propose Cyclic Precision Training (CPT) to cyclically vary the precision between two boundary values which can be identified using a simple precision range test within the first few training epochs. Extensive simulations and ablation studies on five datasets and eleven models demonstrate that CPT's effectiveness is consistent across various models/tasks (including classification and language modeling). Furthermore, through experiments and visualization we show that CPT helps to (1) converge to a wider minima with a lower generalization error and (2) reduce training variance which we believe opens up a new design knob for simultaneously improving the optimization and efficiency of DNN training. Our codes are available at: https://github.com/RICE-EIC/CPT.
翻译:低精度深神经网络(DNN)培训受到极大关注,因为降低精确度是提高DNN培训时间/能源效率的最有效工具之一。在本文中,我们试图从理解DNN培训的最新发现启发的新角度探讨低精度培训:我们推测DNN培训的精确度可能具有与DNN培训期间学习率相似的效果,并倡导沿着培训轨迹动态精确度进一步提高DNN培训的时间/能源效率。具体地说,我们提议Cyclic精度培训(CPT)周期性地改变两个边界值之间的精确度,这两个值可以通过在最初几个培训区进行简单的精确范围测试来确定。关于5个数据集和11个模型的广泛模拟和校准研究表明,CPT在各种模型/塔斯(包括分类和语言建模)中的有效性是一致的。此外,通过实验和可视化,我们表明CPT帮助(1) 向一个范围更广的小型模型汇合,其一般化错误较低,以及(2) 降低培训差异,我们认为在最初几个培训区中可以同时打开我们现有的设计KNIC/MIC。