It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the theoretical mechanism or the mathematical behind the Hessian spectrum is still largely under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectrums of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physical theories and the spectral analysis of natural proteins, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exist and suggest a spectral parallel between protein evolution and training of deep neural networks. By conducing extensive experiments, we further use the power-law spectral framework as a useful tool to explore multiple novel behaviors of deep learning.
翻译:众所周知,深度损失地貌的赫西安人对于优化、普及甚至更牢固的深层学习至关重要。最近的工作经验发现,深层学习中的赫西人频谱有两部分结构,由少量的大型电子价值和大量近零电子价值组成。然而,赫西安人频谱背后的理论机制或数学仍然基本没有得到充分探讨。根据我们的知识,我们首先证明受过良好训练的深层神经网络的赫西人频谱显示了简单的权力法律结构。在统计物理理论和自然蛋白质谱分析的启发下,我们提供了一种最大限度的理论解释,以解释为什么存在权力法结构,并提出蛋白质进化和深神经网络培训之间的光谱平行点。我们通过进行广泛的实验,进一步利用权力法光谱框架作为探索多种深层学习的新行为的有用工具。