Machine learning practitioners invest significant manual and computational resources in finding suitable learning rates for optimization algorithms. We provide a probabilistic motivation, in terms of Gaussian inference, for popular stochastic first-order methods. As an important special case, it recovers the Polyak step with a general metric. The inference allows us to relate the learning rate to a dimensionless quantity that can be automatically adapted during training by a control algorithm. The resulting meta-algorithm is shown to adapt learning rates in a robust manner across a large range of initial values when applied to deep learning benchmark problems.
翻译:机器学习实践者投入大量人工和计算资源,寻找适当的优化算法学习率。我们从高森推论的角度为流行的随机第一阶方法提供了一种概率性动机。作为一个重要的特例,它用一般的衡量标准恢复了Polyak步骤。这种推论使我们得以将学习率与一个无尺寸的数量联系起来,在用控制算法进行的培训中可以自动调整。由此得出的元算法显示,在应用到深层学习基准问题时,能够对大量初始值进行有力的调整。