重球方法的中值场分析:辍学率稳定、连通性和全球趋同 (Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence)

from arxiv, 14 pages in main text; 51 pages including bibliography and appendix. Published in Transcation on Machine Learning Research(TMLR), 2023. https://openreview.net/forum?id=gZna3IiGfl

The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: \emph{(i)} stability after dropping out part of the neurons, \emph{(ii)} connectivity along a low-loss path, and \emph{(iii)} convergence to the global optimum. To achieve this goal, we take a mean-field view and relate the SHB dynamics to a certain partial differential equation in the limit of large network widths. This mean-field perspective has inspired a recent line of work focusing on SGD while, in contrast, our paper considers an algorithm with momentum. More specifically, after proving existence and uniqueness of the limit differential equations, we show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network. Armed with this last bound, we are able to establish the dropout-stability and connectivity of SHB solutions.

翻译：具有Polyak动力的随机梯度梯度下降法(SHB)被广泛用于神经网络培训,然而,尽管这种算法在实践中取得了显著的成功,但其理论定性仍然有限。在本文中,我们侧重于具有两层和三层的神经网络,并严格理解SHB发现解决方案的特性: \emph{(i)}在将部分神经元丢弃后的稳定, \emph{(ii)} 沿低损失路径连接, 和 \emph{(iii)} 与全球最佳融合。为实现这一目标,我们采取了中观,并将SHB动态与大网络宽度范围内的某种局部差异等同联系起来。这种中观激励了最近围绕SHB发现的解决办法的一线工作,而我们的文件则以动力来考虑一种算法。更具体地说,在证明存在限值差方程方程和独特性方程后,我们表现出全球最佳的趋同,并在中给出了中位域界限与SHB最后稳定性网络之间的定量约束。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日