The Hessian of a neural network captures parameter interactions through second-order derivatives of the loss. It is a fundamental object of study, closely tied to various problems in deep learning, including model design, optimization, and generalization. Most prior work has been empirical, typically focusing on low-rank approximations and heuristics that are blind to the network structure. In contrast, we develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency as well as the structural reasons behind it. This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks, allowing for an elegant interpretation in terms of rank deficiency. Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks. Further, we also investigate the implications of model architecture (e.g.~width, depth, bias) on the rank deficiency. Overall, our work provides novel insights into the source and extent of redundancy in overparameterized networks.
翻译:神经网络的黑森人通过损失的二阶衍生物捕捉参数的相互作用。 它是一个基本的研究对象,与深层次学习中的各种问题紧密相连,包括模型设计、优化和一般化。 大部分以前的工作都是经验性的工作,通常侧重于对网络结构视而不见的低级近似值和累进论。 相反,我们开发了理论工具来分析赫森地图的范围,使我们精确地了解其等级缺陷及其背后的结构性原因。 这为深线网络的赫斯人排名提供了精确的公式和紧凑的上界,允许对等级缺陷进行优雅的解释。 此外,我们证明我们的界限仍然是对数字赫森人的排名的估计,对于诸如校正和超偏执的网络等较大型的模型而言。 此外,我们还调查了模型结构(例如~宽度、深度、偏差)对等级缺陷的影响。 总体而言,我们的工作提供了对过分分化的网络的冗余源和程度的新洞察。