We introduce a family of stochastic optimization methods based on the Runge-Kutta-Chebyshev (RKC) schemes. The RKC methods are explicit methods originally designed for solving stiff ordinary differential equations by ensuring that their stability regions are of maximal size.In the optimization context, this allows for larger step sizes (learning rates) and better robustness compared to e.g. the popular stochastic gradient descent method. Our main contribution is a convergence proof for essentially all stochastic Runge-Kutta optimization methods. This shows convergence in expectation with an optimal sublinear rate under standard assumptions of strong convexity and Lipschitz-continuous gradients. For non-convex objectives, we get convergence to zero in expectation of the gradients. The proof requires certain natural conditions on the Runge-Kutta coefficients, and we further demonstrate that the RKC schemes satisfy these. Finally, we illustrate the improved stability properties of the methods in practice by performing numerical experiments on both a small-scale test example and on a problem arising from an image classification application in machine learning.
翻译:我们引入了一个基于龙格-库塔-切尔比谢夫(RKC)方案的最佳优化方法。RKC方法是最初设计用来通过确保稳定区域达到最大大小来解决硬性普通差异方程式的明确方法。在优化背景下,这允许与流行的随机梯度梯度下降法相比,采取更大的步级规模(学习率)和更强的稳健性。我们的主要贡献是所有随机龙格-库塔优化方法的趋同性证明。这显示在强凝固度和利普西茨-耐久性梯度标准假设下,期望与最佳的亚线性线性比率相融合。对于非康韦克斯目标,我们期望梯度会趋同为零。证据要求运行格-库塔系数的某些自然条件,我们进一步证明RDC方案满足了这些条件。最后,我们通过在小规模测试中进行数字实验和在机器学习图像分类应用中产生的问题来说明实际方法的稳定性得到改善。