We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with the neurons. The mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime which is only restricted locally in the so-called neural tangent kernel space around specialized initializations. Several prior works (\cite{chizat2018global, mei2018mean}) establish the asymptotic global optimality of the mean-field regime, but it is still challenging to obtain a quantitative convergence rate due to the complicated unbounded nonlinearity of the training dynamics. This work establishes the first linear convergence result for vanilla two-layer neural networks trained by continuous-time noisy gradient descent in the mean-field regime. Our result relies on a novel time-depdendent estimate of the logarithmic Sobolev constants for a family of measures determined by the evolving distribution of hidden neurons.
翻译:我们考虑在平均场制度下优化双层神经网络,即网络重量的学习动态可以通过对与神经有关的重量参数的概率测量空间的演进来近似于网络重量的学习动态。平均场制度是NTK(懒惰训练)制度的一个理论上有吸引力的替代机制,它只是以专门初始化为核心的所谓神经中层内核空间在当地受到限制。一些以前的工作(cite{chizat2018Global,mei201818mele})建立了平均场制度的无症状全球最佳性,但是由于培训动态的复杂而无线性,要获得量化的趋同率仍然很困难。这项工作为香草双层神经网络确立了第一个线性趋同结果,这些网络在平均场制度下通过连续的噪音梯度下降来培训。我们的结果依赖于对正态Sobolev常数的新的时间密度估计,这是由隐藏神经的分布所决定的一套措施所决定的。