The ability of overparameterized deep networks to interpolate noisy data, while at the same time showing good generalization performance, has been recently characterized in terms of the double descent curve for the test error. Common intuition from polynomial regression suggests that overparameterized networks are able to sharply interpolate noisy data, without considerably deviating from the ground-truth signal, thus preserving their generalization ability. At present, a precise characterization of the relationship between interpolation and generalization for deep networks is missing. In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t.\ to the input variable locally to each training point, over volumes around cleanly- and noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.
翻译:测量过量的深深网络对杂乱数据进行内插和对深深网络的概括分析的能力,虽然同时表现出良好的概括性表现,但最近也以测试误差的双向下曲线为特征,对测试误差的双向曲线作了描述。多角度回归法的常见直觉表明,超度分解网络能够在不大大偏离地脱离地心信号的情况下,对杂乱数据进行急剧的内插,从而保持其概括性能力。目前,对深层网络的内插和概括性之间的关系缺乏精确的定性。在这项工作中,我们量化了由神经网络功能所渗透的培训数据是否适合的精确性,方法是研究每个培训点的损耗场景 w.r.t.\\ 至本地输入变量,在清洁和无线标记的培训样本周围的量过大,因为我们系统地增加了模型参数的数量,从而保留了它们的概括性。我们的研究结果表明,输入空间的损失锐度取决于模型和偏差的双向双向双向下降,在噪音标签上观察到的峰值更差。虽然小的内插模型非常适合清洁和噪音数据,但大直正的模型显示的是巨大的直觉对比。</s>