We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover excellent agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a previously unrecognised role for it in the study of loss surfaces in deep learning. Inspired by these observations, we propose a novel model for the true loss surfaces of neural networks, consistent with our observations, which allows for Hessian spectral densities with rank degeneracy and outliers, extensively observed in practice, and predicts a growing independence of loss gradients as a function of distance in weight-space. We further investigate the importance of the true loss surface in neural networks and find, in contrast to previous work, that the exponential hardness of locating the global minimum has practical consequences for achieving state of the art performance.
翻译:我们调查了当地损失表面的光谱统计 人工神经网络的赫西人,我们在那里发现与高森 Orthogonal 集合统计在一些网络架构和数据集中达成了极好的一致,这些结果为随机矩阵理论适用于神经网络建模提供了新的依据,并提出了它以前在深层学习中研究损失表面方面尚未认识到的作用。我们根据这些观察,提出了神经网络真正损失表面的新模式,这与我们的观察一致,允许赫西人光谱密度达到在实际中广泛观测到的低度和异端,并预测损失梯度作为重量空间距离函数的日益独立。我们进一步调查了真正损失表面在神经网络中的重要性,发现与以往的工作不同,全球最小值定位的指数硬度对于实现艺术性能状态具有实际影响。