We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is an upper bound on the generalization error, which implies that average output stability and a bounded expected optimization error at termination lead to generalization. This result shows that a small generalization error occurs along the optimization path, and allows us to bypass Lipschitz or sub-Gaussian assumptions on the loss prevalent in previous works. For nonconvex, convex, and strongly convex losses, we show the explicit dependence of the generalization error in terms of the accumulated path-dependent optimization error, terminal optimization error, number of samples, and number of iterations. For nonconvex smooth losses, we prove that full-batch GD efficiently generalizes close to any stationary point at termination, and recovers the generalization error guarantees of stochastic algorithms with fewer assumptions. For smooth convex losses, we show that the generalization error is tighter than existing bounds for SGD (up to one order of error magnitude). Consequently the excess risk matches that of SGD for quadratically less iterations. Lastly, for strongly convex smooth losses, we show that full-batch GD achieves essentially the same excess risk rate as compared with the state of the art on SGD, but with an exponentially smaller number of iterations (logarithmic in the dataset size).
翻译:我们为整批重力源值(GD)的平滑损失算法(可能不是Lipschitz,可能不是Convex)提供锐利的基于路径的概括化和超额风险保证。 我们分析的核心是总体化错误的上圈,这意味着平均产出稳定性和在终止时受约束的预期优化误差会导致概括化。 这个结果显示,在优化路径上出现一个小的概括化错误,并允许我们绕过Lipschitz或子Gussian对以往工作中常见损失的假设。对于非Convex、 convex和强烈的 convex损失,我们从累积的基于路径的优化错误、终端优化误差、样本数量和迭代数的高度来看,我们显示了一般化错误的明确依赖性。对于非Convex平滑性损失,我们证明全包GD在终止时与任何固定点相近,并且用较少的假设恢复Stochacrical 运算法的概括性保证。对于SGGD的完全性误差比现有的SGD级值要更近一些。