Understanding capabilities and limitations of different network architectures is of fundamental importance to machine learning. Bayesian inference on Gaussian processes has proven to be a viable approach for studying recurrent and deep networks in the limit of infinite layer width, $n\to\infty$. Here we present a unified and systematic derivation of the mean-field theory for both architectures that starts from first principles by employing established methods from statistical physics of disordered systems. The theory elucidates that while the mean-field equations are different with regard to their temporal structure, they yet yield identical Gaussian kernels when readouts are taken at a single time point or layer, respectively. Bayesian inference applied to classification then predicts identical performance and capabilities for the two architectures. Numerically, we find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks and the convergence speed depends non-trivially on the parameters of the weight prior as well as the depth or number of time steps, respectively. Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$ and we compute next-to-leading-order corrections which turn out to be architecture-specific. The formalism thus paves the way to investigate the fundamental differences between recurrent and deep architectures at finite widths $n$.
翻译:理解不同网络结构的能力和局限性对于机器学习具有根本重要性。 拜斯人对高山进程的推论已证明是研究在无限层宽度范围内的重复式和深层网络的可行方法。 在这里,我们提出对这两种结构的中位理论的统一和系统推导,这种理论从最初原则开始,采用来自混乱系统的统计物理学的既定方法,从最初原则开始。 该理论说明,虽然平均场方程与其时间结构不同,但在一个时间点或层分别进行读出时,却产生相同的高斯内核。 用于分类的贝斯人推论随后预测了两种结构的相同性能和能力。 从数字上看,我们发现,对这两个结构的中位理论的趋同通常比深层网络的中位理论慢一些,而趋同速度则取决于先前重量的参数以及深度或时间步骤的数量。 我们的方法表明,在一次时间点或层上,高斯内方方的流程只是系统扩展的最小顺序,而在1美元/ 美元的经常级结构中,我们从一个方向到深度结构的深度结构。 因此,我们对正位结构对正位结构进行了分析。