Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance. In practical applications, however, there remains an empirical gap between tuned stochastic gradient descent (SGD) and PFSGD. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models. The new update is derived through the solution of an Ordinary Differential Equation (ODE) and solved in a closed form. We show empirically that this new parameter-free algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.
翻译:无参数的随机梯度梯度下降(PFSGD)算法并不要求在实现最佳理论性能的同时设定学习率。 但是,在实际应用中,调控的随机梯度下降(SGD)和PFSGD(PFSGD)之间还存在经验差距。 在本文中,我们用基于连续时间在短线模型上连续使用 Coin-Betting 的无参数新算法来缩小经验差距。 新的更新是通过普通差异等分法(ODE)的解决方案产生的,并以封闭的形式解决的。 我们从经验中显示,这种新的无参数算法比“ 最佳默认” 学习率的算法更符合“ 最佳默认” 学习率, 几乎与微调基线的性能几乎一致, 没有任何可调调调的基线几乎一致。