This paper discusses the estimation of the generalization gap, the difference between a generalization error and an empirical error, for overparameterized models (e.g., neural networks). We first show that a functional variance, a key concept in defining a widely-applicable information criterion, characterizes the generalization gap even in overparameterized settings where a conventional theory cannot be applied. We also propose a computationally efficient approximation of the function variance, the Langevin approximation of the functional variance (Langevin FV). This method leverages only the $1$st-order gradient of the squared loss function, without referencing the $2$nd-order gradient; this ensures that the computation is efficient and the implementation is consistent with gradient-based optimization algorithms. We demonstrate the Langevin FV numerically by estimating the generalization gaps of overparameterized linear regression and non-linear neural network models.
翻译:本文讨论对超度参数模型(例如神经网络)的一般化差距的估计、一般化错误和实验性错误之间的差别。我们首先表明功能差异,这是界定广泛适用的信息标准的一个关键概念,它甚至在无法适用常规理论的超度参数设置中也具有一般化差距的特点。我们还提议对功能差异(Langevin FV)的Langevin近似值进行计算效率近似值。这种方法仅利用平方损失函数的1美元点位梯度,而不参考2美元的顺序梯度;这确保计算效率高,而且执行与基于梯度的优化算法一致。我们用数字显示Langevin FV,方法是估算超度参数线性回归和非线性神经网络模型的普遍化差距。