We explicitly construct the quantum field theory corresponding to a general class of deep neural networks encompassing both recurrent and feedforward architectures. We first consider the mean-field theory (MFT) obtained as the leading saddlepoint in the action, and derive the condition for criticality via the largest Lyapunov exponent. We then compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$, and find a precise analogy with the well-studied $O(N)$ vector model, in which the variance of the weight initializations plays the role of the 't Hooft coupling. In particular, we compute both the $\mathcal{O}(1)$ corrections quantifying fluctuations from typicality in the ensemble of networks, and the subleading $\mathcal{O}(T/N)$ corrections due to finite-width effects. These provide corrections to the correlation length that controls the depth to which information can propagate through the network, and thereby sets the scale at which such networks are trainable by gradient descent. Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
翻译:我们明确构建了量子场理论, 该理论相当于一个包含经常和饲料向前结构的普通的深神经网络。 我们首先考虑以平均场理论(MFT)作为行动的主要支撑点, 并通过最大的 Lyapunov 演示来得出临界状态的条件。 然后我们用深度对宽度的美元对宽度之比的扰动性扩大来计算对相关功能的环形校正, 并找到一个精确的比喻, 与经过仔细研究的 $O( N) 矢量模型相比, 即权重初始化的差异起到“ 热量组合” 的作用。 特别是, 我们计算了 $\ mathcal{O}(1) 校正, 量化网络群中典型的波动, 以及因微量效应而导致的亚值值的校正值 $\ mathcal{O} (T/NN) 校正。 这些校正可以校正控制信息通过网络传播的深度的相对长度, 从而设定了这种网络可以通过梯度下降进行训练的尺度。 我们的分析提供了一条令人感兴趣的路径, 。