Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training. Learning rate schedules are widely used in real practice to adjust the learning rate according to pre-defined schedules for fast convergence and good generalization. However, existing learning rate schedules are all heuristic algorithms and lack theoretical support. Therefore, people usually choose the learning rate schedules through multiple ad-hoc trials, and the obtained learning rate schedules are sub-optimal. To boost the performance of the obtained sub-optimal learning rate schedule, we propose a generic learning rate schedule plugin, called LEArning Rate Perturbation (LEAP), which can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate. We found that, with such a simple yet effective strategy, training processing exponentially favors flat minima rather than sharp minima with guaranteed convergence, which leads to better generalization ability. In addition, we conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets using various learning rate schedules (including constant learning rate).
翻译:学习率表是最重要的超参数之一,对神经网络培训有重大影响。学习率表在实际实践中被广泛使用,以便根据预先确定的快速趋同和良好普及时间表调整学习率。但是,现有的学习率表都是超自然算法,缺乏理论支持。因此,人们通常通过多种特别试验选择学习率表,而获得的学习率表则不尽如人意。为了提高获得的亚最佳学习率表的绩效,我们提议了一个通用学习率表插件,称为LEAP,可应用于各种学习率表,通过引入某种对学习率的干扰来改进模式培训。我们发现,通过这种简单有效的战略,培训处理指数性偏重于平流微型马,而不是精细微型马,保证融合,从而导致更好的普及能力。此外,我们进行了广泛的实验,表明与LEAP的培训能够利用各种学习率表(包括固定学习率)改进不同数据集的各种深层次学习模式的绩效。