Modern deep learning (DL) architectures are trained using variants of the SGD algorithm that is run with a $\textit{manually}$ defined learning rate schedule, i.e., the learning rate is dropped at the pre-defined epochs, typically when the training loss is expected to saturate. In this paper we develop an algorithm that realizes the learning rate drop $\textit{automatically}$. The proposed method, that we refer to as AutoDrop, is motivated by the observation that the angular velocity of the model parameters, i.e., the velocity of the changes of the convergence direction, for a fixed learning rate initially increases rapidly and then progresses towards soft saturation. At saturation the optimizer slows down thus the angular velocity saturation is a good indicator for dropping the learning rate. After the drop, the angular velocity "resets" and follows the previously described pattern - it increases again until saturation. We show that our method improves over SOTA training approaches: it accelerates the training of DL models and leads to a better generalization. We also show that our method does not require any extra hyperparameter tuning. AutoDrop is furthermore extremely simple to implement and computationally cheap. Finally, we develop a theoretical framework for analyzing our algorithm and provide convergence guarantees.
翻译:现代深层次学习( DL) 架构是使用 SGD 算法变量来培训的, 该算法以美元为固定的学习进度表运行, 也就是说, 学习率在预设的学区下降, 通常当培训损失预期会饱和时。 在本文中, 我们开发了一种算法, 使学习率降低 $\ textit{ 自动 $。 我们称之为 AutoDrop 的拟议方法, 其动机是观察到模型参数的角速度, 即趋同方向变化的速度, 即固定学习率最初快速增长, 然后向软饱和度进步。 在优化速度放缓时, 从而减缓培训。 角速度饱和度是降低学习率的良好指标。 下降后, 角速度“ 重新设定” 并遵循先前描述的模式 - 它会再次增加, 直到饱和度。 我们显示我们的方法在SOTA 培训方法上得到了改进: 它加速 DL 模型的训练速度, 并导致向软饱和 进进进的进进度进度进度进度进度进度进度进度进度。 在饱和进度计算过程中, 我们的进度计算中, 我们还需要一个更进度的进度的进度的进度框架, 我们提供一个更进度的进度的进度的进度的进度的进度的进度的进度的进度的进度的进度, 我们的进度的进度的进度的进度框架。