The optimization step in many machine learning problems rarely relies on vanilla gradient descent but it is common practice to use momentum-based accelerated methods. Despite these algorithms being widely applied to arbitrary loss functions, their behaviour in generically non-convex, high dimensional landscapes is poorly understood. In this work, we use dynamical mean field theory techniques to describe analytically the average dynamics of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviour of heavy-ball momentum and Nesterov acceleration in the infinite dimensional limit. By numerical integration of these equations, we observe that these methods speed up the dynamics but do not improve the algorithmic threshold with respect to gradient descent in the spiked model.
翻译:许多机器学习问题的优化步骤很少依赖于香草梯度的下降,但使用加速动力法是常见的做法。尽管这些算法被广泛应用于任意损失功能,但它们在一般非曲线、高维景观中的行为却鲜为人知。在这项工作中,我们使用动态的、中等的实地理论技术,用一种原型的非曲线模型来分析这些方法的平均动态:(喷射的)矩阵电流模型。我们得出一套封闭式方程式,描述重球动力和Nesterov加速在无限的维度限制中的行为。通过这些方程式的数值整合,我们观察到这些方法加快了动态,但没有改善在加压模型中梯度下降的算法阈值。