We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.
翻译:我们为使用梯度下降方法或连续梯度流训练的模型建立了 disintegrated PAC-Bayesian 通用化界限。与 PAC-Bayesian 设置中的标准实践相反,我们的结果适用于确定性优化算法,而不需要任何去随机化步骤。我们的界限是完全可计算的,并且取决于初始分布的密度和沿轨迹的训练目标的 Hessian。我们展示了我们的框架可以应用于各种迭代优化算法,包括随机梯度下降(SGD)、动量基算法和阻尼哈密尔顿动力学等。