In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients.
翻译:在本文中,我们研究了学习最适合培训数据集的浅层人工神经网络的问题。我们在过度参数化的系统中研究了这一问题,因为观测数量少于模型中的参数数量。我们表明,通过四面形激活,这种浅层神经网络培训的优化景观具有某些有利的特征,使得利用各种本地搜索超自然学能够有效地找到全球最佳模型。这一结果是输入/输出对的任意培训数据。对于不同的激活功能,我们还要表明,在适当初始化时,梯度会以线性速率汇合到一种全球最佳模型上。这一结果侧重于一个可实现的模式,从高斯分布中选择投入,并根据种植重量系数生成标签。