Existing generalization measures that aim to capture a model's simplicity based on parameter counts or norms fail to explain generalization in overparameterized deep neural networks. In this paper, we introduce a new, theoretically motivated measure of a network's simplicity which we call prunability: the smallest \emph{fraction} of the network's parameters that can be kept while pruning without adversely affecting its training loss. We show that this measure is highly predictive of a model's generalization performance across a large set of convolutional networks trained on CIFAR-10, does not grow with network size unlike existing pruning-based measures, and exhibits high correlation with test set loss even in a particularly challenging double descent setting. Lastly, we show that the success of prunability cannot be explained by its relation to known complexity measures based on models' margin, flatness of minima and optimization speed, finding that our new measure is similar to -- but more predictive than -- existing flatness-based measures, and that its predictions exhibit low mutual information with those of other baselines.
翻译:旨在根据参数计数或规范获取模型简单性的现有一般化措施未能解释过度量化的深神经网络的通用性。 在本文中, 我们引入了一个新的、 理论上有动机的网络简单性衡量标准, 我们称之为可运行性: 网络参数中最小的\ emph{折射} 可以在不对其培训损失造成不利影响的情况下加以运行的参数。 我们表明, 这一衡量标准高度预测了模型在一系列在CIFAR- 10 上培训的共振性网络中的总体性表现, 与现有的基于运行的计量标准不同, 与网络规模相比并没有增长, 并且即使在特别具有挑战性的双向下降环境下, 也与测试设定的损失高度相关。 最后, 我们表明, 运行性的成功不能由它与已知的基于模型边距、 迷你和优化速度的复杂度衡量标准的关系来解释, 发现我们的新衡量标准与现有的基于稳定度的计量标准相近 -- 但更具有预测性, 并且其预测显示与其他基线的相互信息较低。