培训在无限线-无限线限制范围内深海神经网络的可计量化 (Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit)

To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models make these dynamics difficult to analyze. To overcome these challenges, large-width asymptotics have recently emerged as a fruitful viewpoint and led to practical insights on real-world deep networks. For two-layer neural networks, it has been understood via these asymptotics that the nature of the trained model radically changes depending on the scale of the initial random weights, ranging from a kernel regime (for large initial variance) to a feature learning regime (for small initial variance). For deeper networks more regimes are possible, and in this paper we study in detail a specific choice of "small" initialization corresponding to ''mean-field'' limits of neural networks, which we call integrable parameterizations (IPs). First, we show that under standard i.i.d. zero-mean initialization, integrable parameterizations of neural networks with more than four layers start at a stationary point in the infinite-width limit and no learning occurs. We then propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics. In particular, one of these methods consists in using large initial learning rates, and we show that it is equivalent to a modification of the recently proposed maximal update parameterization $\mu$P. We confirm our results with numerical experiments on image classification tasks, which additionally show a strong difference in behavior between various choices of activation functions that is not yet captured by theory.

翻译：为了从理论上理解受过训练的深神经网络的行为,有必要从随机初始化中研究梯度方法引起的动态。但是,这些模型的非线性和构成结构使得这些动态难以分析。要克服这些挑战,最近出现了大宽线性无序性的观点,并导致对现实世界深度网络的实际洞察力。对于两层神经网络来说,我们通过这些抽象概念理解到,经过训练的模型的性质发生了根本性的变化,这取决于初始随机权重的大小,从一个随机权重(大初始差异)到一个特征学习机制(小初始差异),这些模型的非线性结构使得这些动态难以分析。对于更深层次的网络来说,我们可以详细研究一个具体选择的“小”初始化,与“初级”网络的极限相对应,我们称之为不可忽视的参数化。对于两层神经网络来说,我们通过这些简单的初始权重(i.d. 零度初始化), 坚固性网络的参数化程度比四层强(初始差异)到一个特征(初始差异)系统。对于更深的网络来说,我们无法在最初的层次上更新,我们用无限的精确的轨法方法来显示这些深度的精确的精确的实验, 的精确的精确的精确度, 的精确度是用来显示这些方法。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

[ICML2021]. GRAND：图神经扩散

专知会员服务

27+阅读 · 2021年7月11日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日