Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.
翻译:测算深神经网络(DNN)的通用错误(GE)是一项重要任务,往往取决于是否有搁置的数据。根据单一的培训组合更好地预测GE的能力,可能会产生总体的DNNN设计原则,以减少对试验和测试的依赖,并带来其他绩效评估优势。为了寻找与GE相关的数量,我们调查输入和最后层表示之间的相互信息,使用无限宽度的 DNN 限制约束 MI。现有的输入压缩 GE 约束用于连接MI和GE。据我们所知,这是这方面的第一次经验性研究。我们试图从经验上将理论约束化,我们发现它往往对最佳的模型来说是紧凑的。此外,它检测了许多情况下的培训标签的随机化,反映了测试时间的扰动强度,而且只给很少的培训样本。这些结果很有希望,因为输入压缩在可以有信心地估算MI时广泛适用。