翻译标题： (Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training)

Recent developments in applications of artificial neural networks with over $n=10^{14}$ parameters make it extremely important to study the large $n$ behaviour of such networks. Most works studying wide neural networks have focused on the infinite width $n \to +\infty$ limit of such networks and have shown that, at initialization, they correspond to Gaussian processes. In this work we will study their behavior for large, but finite $n$. Our main contributions are the following: (1) The computation of the corrections to Gaussianity in terms of an asymptotic series in $n^{-\frac{1}{2}}$. The coefficients in this expansion are determined by the statistics of parameter initialization and by the activation function. (2) Controlling the evolution of the outputs of finite width $n$ networks, during training, by computing deviations from the limiting infinite width case (in which the network evolves through a linear flow). This improves previous estimates and yields sharper decay rates for the (finite width) NTK in terms of $n$, valid during the entire training procedure. As a corollary, we also prove that, with arbitrarily high probability, the training of sufficiently wide neural networks converges to a global minimum of the corresponding quadratic loss function. (3) Estimating how the deviations from Gaussianity evolve with training in terms of $n$. In particular, using a certain metric in the space of measures we find that, along training, the resulting measure is within $n^{-\frac{1}{2}}(\log n)^{1+}$ of the time dependent Gaussian process corresponding to the infinite width network (which is explicitly given by precomposing the initial Gaussian process with the linear flow corresponding to training in the infinite width limit).

翻译：宽神经网络：从非高斯随机场的初始化到训练中的NTK几何翻译摘要：最近在具有超过$n=10^{14}$个参数的人工神经网络应用中的发展使得研究这种网络的大规模$n$行为变得极其重要。大多数研究宽神经网络的工作都集中在分析此类网络的无限宽度$n \to +\infty$极限，并且表明在初始化时，它们对应于高斯过程。在本工作中，我们将研究有限大但大规模$n$网络的行为。我们的主要贡献如下：（1）使用$n^{-\frac{1}{2}}$的渐近级数计算高斯性修正。此级数中的系数由参数初始化和激活函数的统计信息决定。（2）通过在有限宽度$n$网络的输出演化中计算偏差，以控制它们的训练过程，同时确定有限宽度下NTK的衰减速率，改善以前的估计，该速率与$n$有关，并在整个训练过程中有效。作为推论，我们还证明，对于足够宽的神经网络，它们的训练以任意高的概率收敛到对应的二次损失函数的全局最小值。（3）估计非高斯性的偏差如何随时间变化而根据$n$变化，在度量空间中找到结果的度量，特别是我们发现，沿着训练的路径，所得到的度量在时间依赖高斯过程的$n^{-\frac{1}{2}}(\log n)^{1+}$精度范围内，该高斯过程对应无限宽网络的演化。

相关内容

高斯过程

关注 6

高斯过程（Gaussian Process, GP）是概率论和数理统计中随机过程（stochastic process）的一种，是一系列服从正态分布的随机变量（random variable）在一指数集（index set）内的组合。高斯过程中任意随机变量的线性组合都服从正态分布，每个有限维分布都是联合正态分布，且其本身在连续指数集上的概率密度函数即是所有随机变量的高斯测度，因此被视为联合正态分布的无限维广义延伸。高斯过程由其数学期望和协方差函数完全决定，并继承了正态分布的诸多性质

加速图神经网络推理，121页ppt，普林斯顿大学JAVIER DUARTE主讲

专知会员服务

33+阅读 · 2022年6月13日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日