The training dynamics and generalization properties of neural networks (NN) can be precisely characterized in function space via the neural tangent kernel (NTK). Structural changes to the NTK during training reflect feature learning and underlie the superior performance of networks outside of the static kernel regime. In this work, we seek to theoretically understand kernel alignment, a prominent and ubiquitous structural change that aligns the NTK with the target function. We first study a toy model of kernel evolution in which the NTK evolves to accelerate training and show that alignment naturally emerges from this demand. We then study alignment mechanism in deep linear networks and two layer ReLU networks. These theories provide good qualitative descriptions of kernel alignment and specialization in practical networks and identify factors in network architecture and data structure that drive kernel alignment. In nonlinear networks with multiple outputs, we identify the phenomenon of kernel specialization, where the kernel function for each output head preferentially aligns to its own target function. Together, our results provide a mechanistic explanation of how kernel alignment emerges during NN training and a normative explanation of how it benefits training.
翻译:神经网络(NN)的培训动态和一般特性可以通过神经核内核(NTK)在功能空间中精确描述。培训期间对NTK的结构变化反映了特征学习,是静态内核制度之外网络优异表现的基础。在这项工作中,我们力求从理论上理解内核调整,这是一种突出和无处不在的结构性变化,使NTK与目标功能相一致。我们首先研究内核演进的玩具模型,其中NTK演变为加速培训,并表明这种需求自然产生配合。我们然后研究深线性网络和两层RELU网络的校准机制。这些理论为实际网络内核调整和专业化提供了良好的定性描述,并确定了网络结构和数据结构中驱动内核核调整的因素。在具有多种产出的非线性网络中,我们找出了内核专业化现象,每个产出头的内核功能都优优于其目标功能。我们的成果共同提供了对内核调整如何在NTENC培训期间出现的理论解释。