We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution subject to the condition that there is no training error when using the linear approximation to the neural network. To illustrate the application of this result, we show that the GD converges to a prediction function that generalizes well, thereby providing an alternative proof of the generalization results in Arora et al. (2019).
翻译:我们考虑过量的单层隐性神经网络中具有平方损失功能的梯度下降(GD)的动态。 最近,我们发现,在某些条件下,使用GD获得的参数值达到零培训误差,如果最初条件选择得当,则该参数值比较好。在这里,通过Lyapunov的分析,我们表明,GD下神经网络加权数的动态值接近于最低标准解决方案的点,条件是在使用神经网络线性近似时没有培训错误。为了说明这一结果的应用情况,我们表明,GD与一个预测函数汇合,该预测函数非常概括化(2019年),从而提供了阿罗拉等人(2019年)一般化结果的替代证据。