Recently, the information-theoretical framework has been proven to be able to obtain non-vacuous generalization bounds for large models trained by Stochastic Gradient Langevin Dynamics (SGLD) with isotropic noise. In this paper, we optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized. This validates that the optimal noise is quite close to the empirical gradient covariance. Technically, we develop a new information-theoretical bound that enables such an optimization analysis. We then apply matrix analysis to derive the form of optimal noise covariance. Presented constraint and results are validated by the empirical observations.
翻译:最近,事实证明,信息理论框架能够为由Stochastic Gradient Langevin Directives (SGLD) 训练的大型模型获得非空泛的通用范围,这些模型具有异地噪音。在本文中,我们优化了操纵SGLD噪音结构所约束的信息理论概括。我们证明,在保证低经验风险的制约下,最佳噪音共变是预期梯度共变的平方根,如果前一种和后一种都得到优化。这证明最佳噪音与实验性梯度共变相当接近。技术上,我们开发了一种新的信息理论约束,从而能够进行这种优化分析。然后我们运用矩阵分析来得出最佳噪音共变形式。现有的制约和结果通过实验性观察得到验证。