Hessian captures important properties of the deep neural network loss landscape. Previous works have observed low rank structure in the Hessians of neural networks. We make several new observations about the top eigenspace of layer-wise Hessian: top eigenspaces for different models have surprisingly high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the corresponding weight matrix. Towards formally explaining such structures of the Hessian, we show that the new eigenspace structure can be explained by approximating the Hessian using Kronecker factorization; we also prove the low rank structure for random data at random initialization for over-parametrized two-layer neural nets. Our new understanding can explain why some of these structures become weaker when the network is trained with batch normalization. The Kronecker factorization also leads to better explicit generalization bounds.
翻译:Hesian 捕捉了深神经网络损失景观的重要特性。 先前的工程在神经网络的赫森人中观测到了低级结构。 我们对层- 海森的顶层天体空间进行了一些新的观测: 不同模型的顶层天体空间存在惊人的高度重叠, 而顶层天体在重塑成与相应的重量矩阵相同的形状时形成低级矩阵。 在正式解释赫森人的这种结构时, 我们显示新的天体空间结构可以通过使用克伦克因子化来接近赫森人来解释; 我们还证明了随机初始化的随机随机数据结构的低级结构。 我们的新理解可以解释为什么这些结构中的某些结构在经过批量常规化训练后会变得较弱。 Kronecker 系数化还导致更明确的概括性界限。