We revisit the Ravine method of Gelfand and Tsetlin from a dynamical system perspective, study its convergence properties, and highlight its similarities and differences with the Nesterov accelerated gradient method. The two methods are closely related. They can be deduced from each other by reversing the order of the extrapolation and gradient operations in their definitions. They benefit from similar fast convergence of values and convergence of iterates for general convex objective functions. We will also establish the high resolution ODE of the Ravine and Nesterov methods, and reveal an additional geometric damping term driven by the Hessian for both methods. This will allow us to prove fast convergence towards zero of the gradients not only for the Ravine method but also for the Nesterov method for the first time. We also highlight connections to other algorithms stemming from more subtle discretization schemes, and finally describe a Ravine version of the proximal-gradient algorithms for general structured smooth + non-smooth convex optimization problems.
翻译:我们从动态系统的角度重新审视Gelfand和Tsetlin的Ravine方法, 研究其趋同特性, 突出其与Nesterov加速梯度方法的相似性和差异。 这两种方法是密切相关的。 它们可以通过在定义中颠倒外推法和梯度操作的顺序而相互推导。 它们也得益于普通康韦克斯目标函数的数值和迭代法的类似快速趋同和迭代法的趋同。 我们还将建立Ravine和Nesterov方法的高分辨率ODE, 并揭示由赫森人为两种方法驱动的另外一个几何测界分解术语。 这将使我们能够证明, 不仅为Ravine 方法,而且为Nesterov 方法第一次向零梯度快速趋同。 我们还强调与由更微妙的离散办法产生的其他算法的连接, 最后将描述一个用于一般结构平稳+非摩托调调调调调调法的准算法的Ravine版本。