平均实地制度三层神经网络的全球趋同 (Global Convergence of Three-layer Neural Networks in the Mean Field Regime)

from arxiv, Appear in ICLR 2021. This is the conference version of arXiv:2001.11443 (which contains treatment of the multilayer neural nets and their global convergence)

In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit. This lends a way to study large-width neural networks via analyzing the mean field limit. Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees. The extension to multilayer ones however has been a highly challenging puzzle, and little is known about the optimization efficiency in the mean field regime when there are more than two layers. In this work, we prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime. We first develop a rigorous framework to establish the mean field limit of three-layer networks under stochastic gradient descent training. To that end, we propose the idea of a \textit{neuronal embedding}, which comprises of a fixed probability space that encapsulates neural networks of arbitrary sizes. The identified mean field limit is then used to prove a global convergence guarantee under suitable regularity and convergence mode assumptions, which -- unlike previous works on two-layer networks -- does not rely critically on convexity. Underlying the result is a universal approximation property, natural of neural networks, which importantly is shown to hold at \textit{any} finite training time (not necessarily at convergence) via an algebraic topology argument.

翻译：在中度实地制度下,神经网络被适当扩大,以便随着宽度趋向无限,学习动态趋向非线性和非边际动态限制,称为中度实地限制。这为通过分析中度实地限制研究大型神经网络提供了一种方法。最近的工作成功地将这种分析应用到双层网络,并提供了全球趋同保证。然而,向多层网络延伸是一个极具挑战性的难题,当有超过两层时,对中度实地制度中的优化效率知之甚少。在这项工作中,我们证明在中度实地制度中,不正规的三层饲料网络具有全球趋同效果。我们首先开发了一个严格的框架,在对中度梯度梯度下降训练中,建立三层网络的中度实地限制。为此,我们提出了一种概念,即由固定的概率空间构成任意规模的神经网络。后来确定的平均字段限制用于证明在适当规律性和趋同模式假设下的全球趋同性趋同性保证。这与以往的两层网络的稳定性相比,最终的稳定性是在两个层次网络上显示的。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日