In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision related convolutional networks and datasets, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks. Thus with our work, we hope to raise awareness of the importance of selecting the right optimizers and the accompanying learning rate policy, at the same time, encourage further research into easy-to-use learning rate policies.
翻译:在深层学习网络的培训中,优化和相关学习率的使用往往没有经过太多的思考,也没有微调,尽管这对于确保快速融合到一个质量良好的最低损失功能至关重要,这个功能也可以在测试数据集中全面推广。从成功应用周期性学习率政策促进计算机愿景相关连动网络和数据集的启发中,我们探索如何运用周期性学习率来培训以变压器为基础的神经网络进行神经机器翻译。从我们精心设计的实验中,我们证明优化器的选择和相关的周期性学习率政策可以对绩效产生重大影响。此外,我们在将周期性学习率应用于神经机器翻译任务时,制定指导方针。因此,我们希望通过我们的工作,提高对选择正确的优化器和配套学习率政策重要性的认识,同时鼓励进一步研究容易使用的学习率政策。