To understand the training dynamics of neural networks (NNs), prior studies have considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical guarantees of its convergence under gradient flow training as well as its approximation and generalization capabilities. In this work, we study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed. To define the limiting model rigorously, we generalize the MF theory of two-layer NNs by treating the neurons as belonging to functional spaces. Then, by writing the MF training dynamics as a kernel gradient flow with a time-varying kernel that remains positive-definite, we prove that its training loss in $L_2$ regression decays to zero at a linear rate. Furthermore, we define function spaces that include the solutions obtainable through the MF training dynamics and prove Rademacher complexity bounds for these spaces. Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors while both exhibiting feature learning.
翻译:为了了解神经网络的培训动态,先前的研究考虑了两层NN的无限宽平均场(MF)限制,在梯度流培训及其近似和概括能力下为它的趋同提供了理论保证。在这项工作中,我们研究了三层NN模式的无限宽限,该模式的第一层是随机的和固定的。为了严格界定限制模式,我们通过将神经神经从属于功能空间来推广两层NN的MF理论。然后,通过将MF培训动态写成一个内核梯度流,以保持正反向内核,我们证明以2美元计算的训练损失以线性速度下降到零。此外,我们界定了功能空间,其中包括通过MF培训动态获得的解决办法,并证明Rademacher对这些空间的复杂界限。我们的理论适应了该模型的不同缩放选择,导致MF的两个制度在展示特征学习的同时,都限制显示独特的行为。