The gradient flow (GF) is an ODE for which its explicit Euler's discretization is the gradient descent method. In this work, we investigate a family of methods derived from \emph{approximate implicit discretizations} of (\GF), drawing the connection between larger stability regions and less sensitive hyperparameter tuning. We focus on the implicit $\tau$-step backwards differentiation formulas (BDFs), approximated in an inner loop with a few iterations of vanilla gradient descent, and give their convergence rate when the objective function is convex, strongly convex, or nonconvex. Numerical experiments show the wide range of effects of these different methods on extremely poorly conditioned problems, especially those brought about in training deep neural networks.
翻译:梯度流( GF) 是一个极分化模式, 其明显的 Euler 的离散性是梯度下降法。 在这项工作中, 我们调查了一组由(\GF) 的\ emph{ 近似隐含离散化法产生的方法, 将较大的稳定区和不太敏感的超参数调联系起来。 我们侧重于隐含的 $\tau$ 的向后偏移公式( BDFs ), 与香草梯度下降的几层相近, 当目标函数是二次曲线、 强烈的二次曲线或非二次曲线时, 我们给出它们的趋同率。 数字实验显示这些不同方法在条件极差的问题上产生的广泛影响, 尤其是那些在训练深层神经网络过程中产生的问题。