An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on the batch size, network width and depth. Second, we derive non-asymptotic generalization bounds for SGN using uniform stability in the overparameterized regime, characterizing the impact of curvature, batch size, and overparameterization on generalization performance. Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton matrix along the optimization path yields tighter stability bounds.
翻译:深度学习中的一个重要问题是高阶优化方法如何影响泛化性能。本文在回归场景下,针对具有光滑激活函数的过参数化深度神经网络,分析了采用Levenberg-Marquardt阻尼与小批量采样的随机高斯-牛顿(SGN)训练方法。我们的理论贡献包含两个方面:首先,通过在参数空间进行变度量分析,建立了显式依赖于批量大小、网络宽度与深度的有限时间收敛界;其次,利用过参数化机制中的一致稳定性,推导了SGN的非渐近泛化界,揭示了曲率、批量大小及过参数化对泛化性能的影响机制。理论结果表明,当优化路径上高斯-牛顿矩阵的最小特征值较大时,SGN将进入更优的泛化区域,此时稳定性界更为紧致。