Overparameterized deep networks can interpolate noisy data while at the same time showing good generalization performance. Common intuition from polynomial regression suggests that large networks are able to sharply interpolate noisy data without considerably deviating from the ground-truth signal. At present, a precise characterization of this phenomenon for deep networks is missing. In this work, we present an empirical study of input-space smoothness of the loss landscape of deep networks over volumes around cleanly- and noisily-labeled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.
翻译:测量过度的深层网络可以将噪音数据内插,同时显示良好的概括性表现。多元回归的共同直觉表明,大型网络能够在不明显偏离地面真实信号的情况下将噪音数据内插。目前,对深层网络的这一现象缺乏准确的描述。在这项工作中,我们提出了一个经验研究,对清洁和有名的训练样本周围的深度网络流失场景的输入-空间平滑性进行了实验性研究,因为我们系统地增加模型参数和培训时代的数量。我们的调查结果显示,输入空间的丧失锐度既遵循模型,也遵循老旧的双向下降,在噪音标签周围观察到的峰值更差。虽然小型的多极化模型非常适合清洁和吵闹的数据,但大型的多极化模型显示了一种平稳的损失场景,在培训数据点周围的大型目标被预测为噪音,与现有的直觉形成对比。