We consider deep classifying neural networks. We expose a structure in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.
翻译:我们考虑对神经网络进行深层分类。 我们暴露了在日志衍生物中与模型参数有关的一个结构, 该模型用来解释赫西安人频谱中存在外部线。 以前的工作将赫西安人分成两个组成部分, 将外线分解为其中的一个, 即所谓的梯度的共变。 我们显示这个术语不是一个共变, 而是第二个瞬间矩阵, 即它受梯度的影响。 这些工具拥有一个添加剂双向结构, 成为频谱中外部线的来源。 这个结构可以用某些“ 保护” 操作来接近赫西安人的主要次空间, 避免高维电子分析的需要 。 我们通过不同的数据集、 结构 和 样本大小来证实这个主张 。