Hessian captures important properties of the deep neural network loss landscape. We observe that eigenvectors and eigenspaces of the layer-wise Hessian for neural network objective have several interesting structures -- top eigenspaces for different models have high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the corresponding weight matrix. These structures, as well as the low rank structure of the Hessian observed in previous studies, can be explained by approximating the Hessian using Kronecker factorization. Our new understanding can also explain why some of these structures become weaker when the network is trained with batch normalization. Finally, we show that the Kronecker factorization can be combined with PAC-Bayes techniques to get better explicit generalization bounds.
翻译:Hesian 捕捉了深层神经网络损失地貌的重要特性。 我们观察到, 层- 神经网络目标Hessian 层- 层- 神经网络目标的源生体和源空间有几种有趣的结构 -- -- 不同模型的顶层神经空间重叠程度较高,顶层脑源在重塑成与相应重量矩阵相同的形状时形成低级矩阵。 这些结构, 以及以前研究中观察到的赫西安人的低级结构, 可以用使用Kronecker 系数化来接近赫西安人来解释。 我们的新理解还可以解释为什么在对网络进行批次正常化培训时, 其中一些结构会变得较弱。 最后, 我们表明, Kronecker 系数化可以与PAC- Bayes 技术相结合, 以获得更清晰的概括化界限。