The goal of this work is to shed light on the remarkable phenomenon of transition to linearity of certain neural networks as their width approaches infinity. We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width. We present a general framework for understanding the constancy of the tangent kernel via Hessian scaling applicable to the standard classes of neural networks. Our analysis provides a new perspective on the phenomenon of constant tangent kernel, which is different from the widely accepted "lazy training". Furthermore, we show that the transition to linearity is not a general property of wide neural networks and does not hold when the last layer of the network is non-linear. It is also not necessary for successful optimization by gradient descent.
翻译:这项工作的目的是要揭示某些神经网络随着宽度接近而向线性转变的显著现象。我们的分析表明,向模型线性和(神经)相干内核(NTK)的耐久性过渡是网络宽度函数黑森矩阵规范的缩放性产物。我们提出了一个一般框架,用以理解通过赫森斯的缩放,通过适用于神经网络标准等级的黑森缩放而使相干内核的耐耐耐久性。我们的分析为恒定的凝固内核现象提供了一个新视角,该现象不同于广泛接受的“懒惰培训 ” 。 此外,我们表明,向线性(NTK)的过渡不是广泛的神经网络的一般属性,在网络最后一层是非线性时并不维持。对于通过梯层成功优化也没有必要。