D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step. Our approach is the first hyper-parameter free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. An open-source implementation is available at \url{https://github.com/facebookresearch/dadaptation}.
翻译:D-自适应是一种自动设置学习率的方法,可以渐近地实现最小凸Lipschitz函数的最佳收敛速率,无需回溯或线搜索,并且每步没有额外的函数值或梯度评估。我们的方法是该类问题的第一个超参数自由方法,无需在收敛速率中加入额外的乘法对数因子。我们针对 SGD 和 Adam 的多个变种提出了广泛的实验,其中该方法自动匹配了十几个不同的机器学习问题上的手动调整学习率,包括大规模视觉和语言问题。开源实现可在 \url{https://github.com/facebookresearch/dadaptation} 上找到。