二次对称范围以外:神经网络损失多尺度结构 (Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes)

A quadratic approximation of neural network loss landscapes has been extensively used to study the optimization process of these networks. Though, it usually holds in a very small neighborhood of the minimum, it cannot explain many phenomena observed during the optimization process. In this work, we study the structure of neural network loss functions and its implication on optimization in a region beyond the reach of a good quadratic approximation. Numerically, we observe that neural network loss functions possesses a multiscale structure, manifested in two ways: (1) in a neighborhood of minima, the loss mixes a continuum of scales and grows subquadratically, and (2) in a larger region, the loss shows several separate scales clearly. Using the subquadratic growth, we are able to explain the Edge of Stability phenomenon [5] observed for the gradient descent (GD) method. Using the separate scales, we explain the working mechanism of learning rate decay by simple examples. Finally, we study the origin of the multiscale structure and propose that the non-convexity of the models and the non-uniformity of training data is one of the causes. By constructing a two-layer neural network problem we show that training data with different magnitudes give rise to different scales of the loss function, producing subquadratic growth and multiple separate scales.

翻译：大量使用神经网络损失表面的二次近似度来研究这些网络的优化过程。虽然通常在极小的一小块区域中, 它无法解释优化过程中观察到的许多现象。在这项工作中, 我们研究神经网络损失功能的结构及其对在一个良好的二次近距离所无法达到的区域优化的影响。从数字上看, 我们观察到神经网络损失功能具有一个多尺度的结构, 表现在两种方式:(1) 在微型附近, 损失混合着一个尺度的连续体, 并生长在次赤道区域, (2) 在更大的区域, 损失明显显示几个不同的尺度。利用亚赤道增长, 我们能够解释为梯度下坡度(GD)方法观测到的稳定现象[5] 。我们用不同的尺度来解释学习率衰减的工作机制。最后, 我们研究多尺度结构的起源, 并提议模型的不一致性和不统一性是造成这些原因之一。通过构建一个两层神经网络的分级结构, 我们用不同尺度来显示不同层次的神经网络的升幅, 我们用不同的尺度来显示数据显示不同层次的升幅。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日