The speed of gradient descent for convex Lipschitz functions is highly dependent on the choice of learning rate. Setting the learning rate to achieve the optimal convergence rate requires knowing the distance D from the initial point to the solution set. In this work, we describe a single-loop method, with no back-tracking or line searches, which does not require knowledge of $D$ yet asymptotically achieves the optimal rate of convergence for the complexity class of convex Lipschitz functions. Our approach is the first parameter-free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. Our method is practical, efficient and requires no additional function value or gradient evaluations each step. An open-source implementation is available (https://github.com/facebookresearch/dadaptation).
翻译:在这项工作中,我们描述的是一种单环方法,没有回溯跟踪或线上搜索,不需要知道$D,然而,在瞬间就能够达到对Convex Lipschitz 功能复杂类别的最佳趋同率。我们的方法是这一类的第一个无参数方法,在汇合率中没有额外的多倍记录系数。我们为SGD和Adam方法的变异提供了广泛的实验,该方法自动匹配了十多个不同机器学习问题的手工调整学习率,包括大型视觉和语言问题。我们的方法是实用的、高效的,不需要额外的功能值或梯度评估,可以使用公开来源的实施(https://github.com/facebourresearch/daaptation)。