Large-batch training has been essential in leveraging large-scale datasets and models in deep learning. While it is computationally beneficial to use large batch sizes, it often requires a specially designed learning rate (LR) schedule to achieve a comparable level of performance as in smaller batch training. Especially, when the number of training epochs is constrained, the use of a large LR and a warmup strategy is critical in the final performance of large-batch training due to the reduced number of updating steps. In this work, we propose an automated LR scheduling algorithm which is effective for neural network training with a large batch size under the given epoch budget. In specific, the whole schedule consists of two phases: adaptive warmup and predefined decay, where the LR is increased until the training loss no longer decreases and decreased to zero until the end of training. Here, whether the training loss has reached the minimum value is robustly checked with Gaussian process smoothing in an online manner with a low computational burden. Coupled with adaptive stochastic optimizers such as AdamP and LAMB, the proposed scheduler successfully adjusts the LRs without cumbersome hyperparameter tuning and achieves comparable or better performances than tuned baselines on various image classification benchmarks and architectures with a wide range of batch sizes.
翻译:大批量培训对于利用大型数据集和深层学习模型至关重要。虽然在计算上使用大批量规模有好处,但通常需要专门设计的学习率(LR)计划,以达到与小批量培训相似的业绩水平。特别是,当培训时代数量受到限制时,使用大批量培训和暖化战略对于大批量培训的最终执行至关重要,因为更新步骤数量减少,因此大批量培训的升级速度减少。在这项工作中,我们提议一种自动的LR排程算法,对于在特定时代预算下进行大批量神经网络培训是有效的。具体地说,整个时间表包括两个阶段:适应暖化和预先确定的衰变,在培训损失不再减少之前,LRR会增加到零,在培训结束之前,使用大批量培训战略对于大批量培训的最后执行至关重要。这里,培训损失是否达到最低值,要与高斯进程以平滑动的方式进行严格检查,而计算负担较低。结合了适应性随机优化优化系统优化,如亚当普和拉姆贝,拟议的表仪成功地调整了各种基准,没有达到繁琐的大幅基准。