This paper discusses estimating the generalization gap, a difference between a generalization gap and an empirical error, for overparameterized models (e.g., neural networks). We first show that a functional variance, a key concept in defining a widely-applicable information criterion, characterizes the generalization gap even in overparameterized settings, where a conventional theory cannot be applied. We next propose a computationally efficient approximation of the function variance, a Langevin approximation of the functional variance~(Langevin FV). This method leverages the 1st-order but not the 2nd-order gradient of the squared loss function; so, it can be computed efficiently and implemented consistently with gradient-based optimization algorithms. We demonstrate the Langevin FV numerically in estimating generalization gaps of overparameterized linear regression and non-linear neural network models.
翻译:本文讨论了对超度参数模型(例如神经网络)的通用差值、通用差值和经验差值之间的差值。我们首先表明,功能差是界定广泛适用的信息标准的一个关键概念,它甚至在无法应用常规理论的超度差值设置中也具有通用差值的特点。我们随后提议对功能差值进行计算效率近差,即功能差值~(Langevin FV)的兰格文近差值。这种方法利用了方位损失函数的第1级梯度,而不是第2级梯度;因此,它可以有效计算,并与基于梯度的优化算法保持一致。我们用数字方式展示了Langevin FV,用于估算多度线性回归和非线性神经网络模型的通用差值。