We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that in the underparameterized regime the network learns eigenfunctions of an integral operator $T_{K^\infty}$ determined by the Neural Tangent Kernel (NTK) at rates corresponding to their eigenvalues. For example, for uniformly distributed data on the sphere $S^{d - 1}$ and rotation invariant weight distributions, the eigenfunctions of $T_{K^\infty}$ are the spherical harmonics. Our results can be understood as describing a spectral bias in the underparameterized regime. The proofs use the concept of "Damped Deviations", where deviations of the NTK matter less for eigendirections with large eigenvalues due to the occurence of a damping factor. Aside from the underparameterized regime, the damped deviations point-of-view can be used to track the dynamics of the empirical risk in the overparameterized setting, allowing us to extend certain results in the literature. We conclude that damped deviations offers a simple and unifying perspective of the dynamics when optimizing the squared error.
翻译:当通过梯度流优化平均正方差差时,我们研究了功能空间中神经网络的动态。我们显示,在参数不足的系统中,网络学习了由神经凝固内核内核(NTK)确定,其比率与其天值相对应的综合操作员 $T ⁇ K ⁇ infty}$T ⁇ K ⁇ K ⁇ infty} $(NTK) 的偏差功能。例如,统一分布的球体数据($S ⁇ d - 1} 美元)和旋转变异重量分布数据,除参数不足的系统外,方位偏差点可以用来跟踪在参数过差环境中的经验风险动态。我们的结果可以被理解为描述在参数过低的系统中的光谱偏差。这些证据使用了“紫外偏差”的概念,即NTK物质偏差对于由于形成阻隔系数而产生较大损值的天平偏差偏差较少。除了参数外,还可以使用参数偏差点的偏差点来跟踪在参数过低的定位环境中的实验风险的动态。当我们得出一个简单的方向时,我们能够将一个最优化的图像。