Over-parameterized deep neural networks (DNNs) with sufficient capacity to memorize random noise can achieve excellent generalization performance, challenging the bias-variance trade-off in classical learning theory. Recent studies claimed that DNNs first learn simple patterns and then memorize noise; some other works showed a phenomenon that DNNs have a spectral bias to learn target functions from low to high frequencies during training. However, we show that the monotonicity of the learning bias does not always hold: under the experimental setup of deep double descent, the high-frequency components of DNNs diminish in the late stage of training, leading to the second descent of the test error. Besides, we find that the spectrum of DNNs can be applied to indicating the second descent of the test error, even though it is calculated from the training set only.
翻译:具有足够能力去记忆随机噪声的超临界深神经网络(DNN)能够取得极佳的概括性表现,挑战古典学习理论中的偏差取舍。最近的研究表明,DNN首先学习简单的模式,然后消化噪音;其他一些作品显示,DNN在培训期间有一种光谱偏差,以从低频到高频学习目标功能。然而,我们表明,学习偏差的单调性并不总能维持下去:在深双向的实验性设置下,DNN的高频组件在培训后期逐渐减少,导致试验错误的第二次下降。此外,我们发现DNNN的频谱可以用来显示测试错误的第二次下降,即使只是从训练组数计算出来。