Deep learning practitioners often operate on a computational and monetary budget. Thus, it is critical to design optimization algorithms that perform well under any budget. The linear learning rate schedule is considered the best budget-aware schedule, as it outperforms most other schedules in the low budget regime. On the other hand, learning rate schedules -- such as the \texttt{30-60-90} step schedule -- are known to achieve high performance when the model can be trained for many epochs. Yet, it is often not known a priori whether one's budget will be large or small; thus, the optimal choice of learning rate schedule is made on a case-by-case basis. In this paper, we frame the learning rate schedule selection problem as a combination of $i)$ selecting a profile (i.e., the continuous function that models the learning rate schedule), and $ii)$ choosing a sampling rate (i.e., how frequently the learning rate is updated/sampled from this profile). We propose a novel profile and sampling rate combination called the Reflected Exponential (REX) schedule, which we evaluate across seven different experimental settings with both SGD and Adam optimizers. REX outperforms the linear schedule in the low budget regime, while matching or exceeding the performance of several state-of-the-art learning rate schedules (linear, step, exponential, cosine, step decay on plateau, and OneCycle) in both high and low budget regimes. Furthermore, REX requires no added computation, storage, or hyperparameters.
翻译:深层学习实践者通常以计算和货币预算运作。 因此, 设计在任何预算下效果良好的优化算法至关重要。 线性学习率表被认为是最好的预算意识时间表, 因为它比低预算制度中的大多数其他时间表要好。 另一方面, 当模型可以培训许多时代时, 深层次学习率时间表( 如\ texttt{30- 60- 90} 步表) 已知可以达到高绩效。 然而, 通常不为人所熟知的是, 一个人的预算是大还是小; 因此, 最优选择学习率时间表是在个案基础上作出的。 在本文件中, 我们将学习率时间表选择问题设定为以美元为组合, 因为它选择了剖析( 例如, 模拟学习率的连续函数), 并且当模型可以对许多时代进行培训时, 我们提议了一个新的配置和采样率组合, 即反思的存储率( REX) 时间表是最佳选择的, 在七个不同的预算周期中, 最高级的排序中, 需要比高级的, 高级预算周期,, 以及 高级的比高级的 高级的 。