We provide sharp path-dependent generalization and excess error guarantees for the full-batch Gradient Decent (GD) algorithm for smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is a novel generalization error technique for deterministic symmetric algorithms, that implies average output stability and a bounded expected gradient of the loss at termination leads to generalization. This key result shows that small generalization error occurs at stationary points, and allows us to bypass Lipschitz assumptions on the loss prevalent in previous work. For nonconvex, convex and strongly convex losses, we show the explicit dependence of the generalization error in terms of the accumulated path-dependent optimization error, terminal optimization error, number of samples, and number of iterations. For nonconvex smooth losses, we prove that full-batch GD efficiently generalizes close to any stationary point at termination, under the proper choice of a decreasing step size. Further, if the loss is nonconvex but the objective is PL, we derive vanishing bounds on the corresponding excess risk. For convex and strongly-convex smooth losses, we prove that full-batch GD generalizes even for large constant step sizes, and achieves a small excess risk while training fast. Our full-batch GD generalization error and excess risk bounds are significantly tighter than the existing bounds for (stochastic) GD, when the loss is smooth (but possibly non-Lipschitz).
翻译:我们的分析核心是确定性对称算法的一种新颖的概括性误差技术,这意味着平均产出稳定性和终止时损失的捆绑性梯度会导致普遍化。这个关键结果显示,小一般化误差发生在固定点,使我们能够绕过Lipschitz对以往工作中常见损失的假设。对于非康维克斯、混凝土和强烈的混凝土损失,我们的分析核心是确定性对称算法的新的概括性误差技术,这意味着平均产出稳定性和在终止时损失的预期梯度的捆绑性梯度导致普遍化。我们证明,完全性GD在固定点发生小的概括性差错,在正确选择步骤大小缩小的情况下,可以绕过Lipschitz对以往工作中常见损失的假设。对于非康维克斯、混凝土和强烈的混凝固性差,在累积基于路径的深度差差差上会逐渐消亡。对于相关的Gx总风险来说,Confredical-lax 完全性差(如果损失是非康化的,那么,我们就会在相同的缩缩缩缩缩缩性风险中,在最短的Gx 和最大幅度的累性风险中会证明我们总的Gx)