The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate learning rate schedule. When such methods appear as inner loops of other algorithms, expecting the user to tune the learning rates may be impractical. To address this, we introduce AutoGD: a gradient descent method that automatically determines whether to increase or decrease the learning rate at a given iteration. We establish the convergence of AutoGD, and show that we can recover the optimal rate of GD (up to a constant) for a broad class of functions without knowledge of smoothness constants. Experiments on a variety of traditional problems and variational inference optimization tasks demonstrate strong performance of the method, along with its extensions to AutoBFGS and AutoLBFGS.
翻译:基于梯度的优化方法(如标准梯度下降法GD)的性能在很大程度上取决于学习率的选择。然而,选择合适的学
习率调度策略通常需要用户进行大量非平凡的调参工作。当此类方法作为其他算法的内部循环时,期望用户手动调
整学习率往往不切实际。为解决这一问题,我们提出了AutoGD:一种能够在给定迭代中自动判断应增大或减小学
习率的梯度下降方法。我们证明了AutoGD的收敛性,并表明该方法无需已知光滑性常数,即可对一大类函数恢复
GD的最优收敛速率(至常数因子)。在多种传统优化问题及变分推断优化任务上的实验表明,该方法及其扩展版
本AutoBFGS和AutoLBFGS均表现出优异的性能。