Recent findings (e.g., arXiv:2103.00065) demonstrate that modern neural networks trained by full-batch gradient descent typically enter a regime called Edge of Stability (EOS). In this regime, the sharpness, i.e., the maximum Hessian eigenvalue, first increases to the value 2/(step size) (the progressive sharpening phase) and then oscillates around this value (the EOS phase). This paper aims to analyze the GD dynamics and the sharpness along the optimization trajectory. Our analysis naturally divides the GD trajectory into four phases depending on the change of the sharpness. We empirically identify the norm of output layer weight as an interesting indicator of sharpness dynamics. Based on this empirical observation, we attempt to theoretically and empirically explain the dynamics of various key quantities that lead to the change of sharpness in each phase of EOS. Moreover, based on certain assumptions, we provide a theoretical proof of the sharpness behavior in EOS regime in two-layer fully-connected linear neural networks. We also discuss some other empirical findings and the limitation of our theoretical results.
翻译:最近的调查结果(例如,arXiv:2103.00065)表明,经过全称梯度下降训练的现代神经网络通常进入一个称为稳定边缘(EOS)的体系。在这个体系中,精锐度,即最大赫西安星元值,首先增加值2/(步骤大小)(逐步变色阶段),然后围绕这一价值(EOS阶段)振荡。本文旨在分析GD动态和优化轨迹的锐度。我们的分析自然将GD轨迹分为四个阶段,这取决于锐度的变化。我们从经验上确定产出层重量的规范,作为锐度动态的有趣指标。根据这一经验观察,我们试图从理论上和实验上解释导致EOS每个阶段变亮度变化的各种关键数量的动态。此外,根据某些假设,我们从理论上证明EOS系统在两层完全相连的线性网络中的敏度行为。我们还讨论一些其他经验性结论和我们理论结果的局限性。