线-线宽神经网络的光谱演变和无变化 (Spectral evolution and invariance in linear-width neural networks)

We investigate the spectral properties of linear-width feed-forward neural networks, where the sample size is asymptotically proportional to network width. Empirically, we show that the weight spectra in this high dimensional regime are invariant when trained by gradient descent for small constant learning rates and the changes in both operator and Frobenius norm are $\Theta(1)$ in the limit. This implies the bulk spectra for both the conjugate and neural tangent kernels are also invariant. We demonstrate similar characteristics for models trained with mini-batch (stochastic) gradient descent with small learning rates and provide a theoretical justification for this special scenario. When the learning rate is large, we show empirically that an outlier emerges with its corresponding eigenvector aligned to the training data structure. We also show that after adaptive gradient training, where we have a lower test error and feature learning emerges, both the weight and kernel matrices exhibit heavy tail behavior. Different spectral properties such as invariant bulk, spike, and heavy-tailed distribution correlate to how far the kernels deviate from initialization. To understand this phenomenon better, we focus on a toy model, a two-layer network on synthetic data, which exhibits different spectral properties for different training strategies. Analogous phenomena also appear when we train conventional neural networks with real-world data. Our results show that monitoring the evolution of the spectra during training is an important step toward understanding the training dynamics and feature learning.

翻译：我们调查线形向向神经网络的光谱特性, 样本大小与网络宽度成比例。生动地, 我们显示, 这个高维系统中的重量光谱在通过梯度梯度下降来训练时是无差异的, 并且操作者和Frobenius规范的变化在极限中是$\Theta(1)美元。这意味着共和和神经相向内核的散数光谱也是无差异的。我们展示了以小学习率以微量( 随机) 梯度下降为基底的模型的相似特性, 并为这一特殊情景提供了理论上的理由。当学习率大时, 我们从实验性地展示出一个外端, 其对应的惯性能与培训数据结构一致。我们还显示, 在适应性梯度训练后, 我们的测试错误和特征学习出现一个较低, 重量和内核内核基质矩阵都表现出强烈的尾部行为。不同的光谱特性, 比如, 微量的( ) 梯度( ) 梯度( ) 梯度( ) 梯度) 梯度下降级递) 下降下降递递递递递递递递分布分布分布分布分布分布分布分布分布显示我们如何如何。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日