We show that the input correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly over exponentially large ranges. We call such eigenspectra "sloppy" because sets of weights corresponding to small eigenvalues can be changed by large magnitudes without affecting the loss. Networks trained on atypical datasets with non-sloppy inputs do not share these traits and deep networks trained on such datasets generalize poorly. Inspired by this, we study the hypothesis that sloppiness of inputs aids generalization in deep networks. We show that if the Hessian is sloppy, we can compute non-vacuous PAC-Bayes generalization bounds analytically. By exploiting our empirical observation that training predominantly takes place in the non-sloppy subspace of the FIM, we develop data-distribution dependent PAC-Bayes priors that lead to accurate generalization bounds using numerical optimization.
翻译:我们显示,典型分类数据集的输入关联矩阵具有微粒分光度,在最初急剧下降后,大量小型电子元值均匀分布于一个指数级大范围。这个结构在接受过有关这些数据培训的网络中反射:我们显示,赫森和渔业信息矩阵(FIM)的输入关联矩阵具有在指数级大范围中均匀分布的等离子值。我们称这种eigenspetra为“悬浮”,因为与小电子值相对应的几组重量可以大幅改变,而不会影响损失。在非软投入的非典型数据集方面受过培训的网络并不分享这些特性和深层次网络。我们受此启发,我们研究了投入的偏差有助于深度网络的统观化的假设。我们显示,如果赫森粗糙,我们就可以用分析的方式计算出非倾斜的PAC-Bayes一般值的宽度。我们通过利用我们的经验观测发现,培训主要在非螺旋型非螺旋型的亚空间进行,然后使用以前的FIMA-AAA号精确度分析,我们开发了数据。