Understanding capabilities and limitations of different network architectures is of fundamental importance to machine learning. Bayesian inference on Gaussian processes has proven to be a viable approach for studying recurrent and deep networks in the limit of infinite layer width, $n\to\infty$. Here we present a unified and systematic derivation of the mean-field theory for both architectures that starts from first principles by employing established methods from statistical physics of disordered systems. The theory elucidates that while the mean-field equations are different with regard to their temporal structure, they yet yield identical Gaussian kernels when readouts are taken at a single time point or layer, respectively. Bayesian inference applied to classification then predicts identical performance and capabilities for the two architectures. Numerically, we find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks and the convergence speed depends non-trivially on the parameters of the weight prior as well as the depth or number of time steps, respectively. Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$. The formalism thus paves the way to investigate the fundamental differences between recurrent and deep architectures at finite widths $n$.
翻译:理解不同网络结构的能力和局限性对于机器学习至关重要。 贝叶斯对高山进程的推论已证明是研究无限层宽度范围内的重复式和深层网络的可行方法。 在这里,我们提出对这两种结构的中位理论的统一和系统推导,这种推论从最初原则开始,采用来自混乱系统统计物理学的既定方法,从最初原则开始,采用混乱系统的统计物理方法; 理论说明,虽然平均场方程与其时间结构不同,但是在一次时间点或一层分别进行读出时,却产生相同的高斯内核。 用于分类的贝叶斯推论预测了两种结构的相同性能和能力。 从数字上看,我们发现,对于常态网络而言,与深层网络相比,与中位理论的趋同通常慢,而趋同速度取决于前重的参数以及深度或时间步骤的数量。 我们的方法显示,在一次时间点或层分别进行读取时空进程时,高斯内核内核内核的内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内