While momentum-based methods, in conjunction with stochastic gradient descent (SGD), are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that there exists a convex loss function for which algorithmic stability fails to establish generalization guarantees when SGD with standard heavy-ball momentum (SGDM) is run for multiple epochs. Then, for smooth Lipschitz loss functions, we analyze a modified momentum-based update rule, i.e., SGD with early momentum (SGDEM), and show that it admits an upper-bound on the generalization error. Thus, our results show that machine learning models can be trained for multiple epochs of SGDEM with a guarantee for generalization. Finally, for the special case of strongly convex loss functions, we find a range of momentum such that multiple epochs of standard SGDM, as a special form of SGDEM, also generalizes. Extending our results on generalization, we also develop an upper-bound on the expected true risk, in terms of the number of training steps, the size of the training set, and the momentum parameter. Experimental evaluations verify the consistency between the numerical results and our theoretical bounds and the effectiveness of SGDEM for smooth Lipschitz loss functions.
翻译:虽然在培训机器学习模型时广泛使用基于动力的方法,同时使用基于动力的梯度下降法(SGD),但在理论上很少理解这类方法的普遍错误。在这项工作中,我们首先表明,在标准重球动力SGD(SGDM)运行于多个时代时,存在着算法稳定性无法确立普遍化保障的螺旋损失功能。然后,在平稳的Lipschitz损失功能方面,我们分析了经修改的基于动力的更新规则,即具有早期动力的SGD(SGDD),并表明它承认普遍化错误的上限。因此,我们的结果显示,在SGDEM的多个时代,机器学习模型可以被培训到多个时代,保证普遍化。 最后,对于具有标准重重重球动力的SGDDM(SGDM)的特例,我们发现一系列的动力,例如标准SGDMDM的多步,作为SDM的一种特殊形式,也加以概括化。 扩大我们关于普遍化的结果,我们也在预期的真正风险方面发展了一个上限,在SDM的理论水平和标准水平试验步骤的数值上,我们对标准损失的数值评价的数值的数值和标准的数值评价之间。