高分辨率动态观点 (A high-resolution dynamical view on momentum methods for over-parameterized neural networks)

Due to the simplicity and efficiency of the first-order gradient method, it has been widely used in training neural networks. Although the optimization problem of the neural network is non-convex, recent research has proved that the first-order method is capable of attaining a global minimum for training over-parameterized neural networks, where the number of parameters is significantly larger than that of training instances. Momentum methods, including heavy ball method (HB) and Nesterov's accelerated method (NAG), are the workhorse first-order gradient methods owning to their accelerated convergence. In practice, NAG often exhibits better performance than HB. However, current research fails to distinguish their convergence difference in training neural networks. Motivated by this, we provide convergence analysis of HB and NAG in training an over-parameterized two-layer neural network with ReLU activation, through the lens of high-resolution dynamical systems and neural tangent kernel (NTK) theory. Compared to existing works, our analysis not only establishes tighter upper bounds of the convergence rate for both HB and NAG, but also characterizes the effect of the gradient correction term, which leads to the acceleration of NAG over HB. Finally, we validate our theoretical result on three benchmark datasets.

翻译：由于一阶梯度方法的简单和效率,该方法被广泛用于神经网络的培训。虽然神经网络的最优化问题不是康维克斯,但最近的研究证明,第一阶方法能够达到对超分度神经网络培训的全球最低要求,其参数数量大大高于培训实例。动力方法,包括重球法和Nesterov加速法(NAG),是其加速趋同所拥有的第一阶梯度方法。在实践中,NAG的表现往往比HB好。然而,目前的研究未能区分其在培训神经网络中的趋同差异。为此,我们通过高分辨率动态系统和纳氏加速法理论的透镜,对HB和NAG的趋同进行了趋同分析,培训一个具有ReLU激活作用的超分度双层神经网络。与RLU相比,我们的分析不仅为HB和NAG的趋同率设定了较近的上限。我们最后对HAG的升级结果进行了理论校验。

相关内容

Neural Networks

关注 1649

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日