We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively define the hidden layer activation distribution, the evolution of the neural tangent kernel, and consequently output predictions. We show that the field theory derivation recovers the recursive stochastic process of infinite-width feature learning networks obtained from Yang and Hu (2021) with Tensor Programs . For deep linear networks, these kernels satisfy a set of algebraic matrix equations. For nonlinear networks, we provide an alternating sampling procedure to self-consistently solve for the kernel order parameters. We provide comparisons of the self-consistent solution to various approximation schemes including the static NTK approximation, gradient independence assumption, and leading order perturbation theory, showing that each of these approximations can break down in regimes where general self-consistent solutions still provide an accurate description. Lastly, we provide experiments in more realistic settings which demonstrate that the loss and kernel dynamics of CNNs at fixed feature learning strength is preserved across different widths on a CIFAR classification task.
翻译:我们通过一个自相矛盾的动态实地理论,分析在以梯度流来训练的无限宽线神经网络中的学习。我们建造了一系列确定性动态秩序参数,这些参数是内产品内核内核,用于在时间点对齐的每个层隐藏单元激活和梯度,通过培训减少了对网络活动的描述。这些内核秩序参数共同定义了隐藏的层激活分布、神经对流内核的演进以及随后的产出预测。我们显示,实地理论衍生恢复了从扬和胡(2021年)与Tensor程序(2021年)获得的无限宽地特征学习网络的循环性随机性随机性进程。对于深线性网络来说,这些内核内核满足了一组代数矩阵方程式。对于非线性网络来说,我们提供了一个交替的取样程序,以自我一致的方式解决内核命令参数的问题。我们比较了各种近似方案的自我一致性强度解决方案,包括静态的NTK近似、斜度独立假设以及引导顺序的顺序学习过程理论。对于深度网络来说,这些直径直径直线阵阵阵式矩阵的每个实验都展示了一种更精确的内核阵列的理论。