GradInit:学习启动神经网络以促进稳定和高效培训 (GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training)

Innovations in neural architectures have fostered significant breakthroughs in language modeling and computer vision. Unfortunately, novel architectures often result in challenging hyper-parameter choices and training instability if the network parameters are not properly initialized. A number of architecture-specific initialization schemes have been proposed, but these schemes are not always portable to new architectures. This paper presents GradInit, an automated and architecture agnostic method for initializing neural networks. GradInit is based on a simple heuristic; the norm of each network layer is adjusted so that a single step of SGD or Adam with prescribed hyperparameters results in the smallest possible loss value. This adjustment is done by introducing a scalar multiplier variable in front of each parameter block, and then optimizing these variables using a simple numerical scheme. GradInit accelerates the convergence and test performance of many convolutional architectures, both with or without skip connections, and even without normalization layers. It also improves the stability of the original Transformer architecture for machine translation, enabling training it without learning rate warmup using either Adam or SGD under a wide range of learning rates and momentum coefficients. Code is available at https://github.com/zhuchen03/gradinit.

翻译：神经结构的创新促进了语言建模和计算机愿景方面的重大突破。不幸的是,新结构往往导致在网络参数未适当初始化的情况下挑战超参数选择和培训不稳定性,如果网络参数没有适当初始化,则新结构结构往往会导致挑战超参数选择和培训不稳定性。已经提出了一些特定建筑的初始化计划, 但这些计划并非总可以移植到新结构中。本文展示了GradInit, 这是一种启动神经网络的自动和建筑的不可知性方法。 GradInit 以简单的超常为基础; 每个网络层的规范都经过调整, 使得SGD或Adam的单步或带有指定超参数的单步导致最小的损失值。这一调整是通过在每个参数块前面引入一个标度倍增变数变量来完成的, 然后利用一个简单的数字方案优化这些变量。 GradInit 加速了许多革命结构的趋同和测试性能, 不论是否跳过连接, 甚至没有正常化层。它还改善了机器翻译原变换结构的稳定性, 使得在广泛的学习速度/ SGD 和动力系数下, http http://comgistrubs.

相关内容

Neural Networks

关注 1649

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【NUS-Xavier教授】注意力神经网络，79页ppt

专知会员服务

65+阅读 · 2021年11月25日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日