In this paper, we present the convergence analysis of momentum methods in training a two-layer over-parameterized ReLU neural network, where the number of parameters is significantly larger than that of training instances. Existing works on momentum methods show that the heavy-ball method (HB) and Nesterov's accelerated method (NAG) share the same limiting ordinary differential equation (ODE), which leads to identical convergence rate. From a high-resolution dynamical view, we show that HB differs from NAG in terms of the convergence rate. In addition, our findings provide tighter upper bounds on convergence for the high-resolution ODEs of HB and NAG.
翻译:在本文中,我们介绍对培训两级超分度ReLU神经网络的势头方法的趋同分析,该网络的参数数量远远超过培训实例的数量。关于动力方法的现有工作表明,重球方法和Nesterov加速法(NAG)具有相同的限制普通差分方程式(ODE),这导致相同的趋同率。从高分辨率动态观点来看,我们显示HB在趋同率方面不同于NAG。此外,我们的调查结果为HB和NAG的高分辨率极分分子的趋同提供了更严格的上限。