This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying analysis bridges different regimes of interest, such as the classical gradient-flow regime of vanishing learning rate, the high-dimensional regime of large input dimension, and the overparameterised "mean-field" regime of large network width, covering as well the intermediate regimes where the limiting dynamics is determined by the interplay between these behaviours. In particular, in the high-dimensional limit, the infinite-width dynamics is found to remain close to a low-dimensional subspace spanned by the target principal directions. Our results therefore provide a unifying picture of the limiting SGD dynamics with synthetic data.
翻译:本手稿对关于高斯数据和标签的双层神经网络的单层随机梯度梯度下行动态进行了调查,这些神经网络通过类似但不一定完全相同的目标函数生成的数据和标签进行了培训。我们从人口风险的充分统计数据的角度,严格地通过确定性和低维描述来分析限制动态。我们的统一分析连接了不同的关注制度,如典型的消散学习率梯度-流制度、大型投入层面的高度系统、以及大型网络宽度的超分界线“平均场”制度,以及由这些行为之间的相互作用决定限制动态的中间制度。特别是,在高维限度内,无限维度动态被认为仍然接近目标主方向所跨越的低维次空间。因此,我们的结果提供了限制SGD动态与合成数据的统一图象。