We continue a long line of research aimed at proving convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Like in many previous works, our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances and network initialization, adversarial labels. It is more general in the sense that we allow both layers to be trained simultaneously and at {\em different} rates. Our results improve on state-of-the-art [Oymak Soltanolkotabi 20] (training the first layer only) and [Nguyen 21, Section 3.2] (training both layers with Le Cun's initialization). We also report several simple experiments with synthetic data. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the ``NTK regime''.
翻译:我们继续进行长期研究,以证明通过梯度下降而培训的深度2神经网络与全球最小值的趋同。与以往许多著作一样,我们的模型具有以下特征:二次损失功能回归、完全连接的进化结构、RelU激活、高斯数据实例和网络初始化、对抗性标签等。更普遍的是,我们允许两个层同时培训,以不同的速率培训。我们的成果改进了最先进的[Oymak Soltanolkotabi 20](仅培训第一层)和[Nguyen 21,第3.2节](以Le Cun的初始化对两层进行培训 ) 。我们还报告了几个简单的合成数据实验。他们强烈表示,至少在我们的模型中,趋同现象远远超越了“NTK”制度。