培训在无限线-无限线限制范围内深海神经网络的可计量化 (Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit)

To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models make these dynamics difficult to analyze. To overcome these challenges, large-width asymptotics have recently emerged as a fruitful viewpoint and led to practical insights on real-world deep networks. For two-layer neural networks, it has been understood via these asymptotics that the nature of the trained model radically changes depending on the scale of the initial random weights, ranging from a kernel regime (for large initial variance) to a feature learning regime (for small initial variance). For deeper networks more regimes are possible, and in this paper we study in detail a specific choice of ''small'' initialization corresponding to "mean-field" limits of neural networks, which we call integrable parameterizations (IPs). First, we show that under standard i.i.d. zero-mean initialization, integrable parameterizations of neural networks with more than four layers start at a stationary point in the infinite-width limit and no learning occurs. We then propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics. In particular, one of these methods consists in using large initial learning rates, and we show that it is equivalent to a modification of the recently proposed maximal update parameterization $\mu$P. We confirm our results with numerical experiments on image classification tasks, which additionally show a strong difference in behavior between various choices of activation functions that is not yet captured by theory.

翻译：为了从理论上理解受过训练的深神经网络的行为,有必要从随机初始化中研究梯度方法引起的动态。但是,这些模型的非线性和构成结构使得这些动态难以分析。为了克服这些挑战,最近出现了大宽线性无序性的观点,并导致对现实世界深度网络的实际洞察力。对于两层神经网络来说,通过这些随机学理解,经过训练的模型的性质发生了根本性的变化,这取决于初始随机权重的大小,从一个(对于初始差异较大的)内核制度到一个特征学习制度(对于初始差异小),这些模型的不线性结构使得这些动态难以分析。对于更深层次的网络来说,我们详细研究一个具体选择“小型”初始化的方法,与神经网络的“平均场”界限相对应,我们称之为不可忽视的参数化。首先,根据标准一. d. 零度初始化, 坚固的神经网络的参数化,从一个以上层次开始,到一个特殊的初始差异(对于初始差异的初始性差异)系统,我们仔细地研究了“小型”的模型,然后用这些细度方法来展示。

相关内容

Neural Networks

关注 1644

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

专知会员服务

22+阅读 · 2020年11月13日

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日