The ability of overparameterized deep networks to interpolate noisy data, while at the same time showing good generalization performance, has been recently characterized in terms of the double descent curve for the test error. Common intuition from polynomial regression suggests that overparameterized networks are able to sharply interpolate noisy data, without considerably deviating from the ground-truth signal, thus preserving generalization ability. At present, a precise characterization of the relationship between interpolation and generalization for deep networks is missing. In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanly- and noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.
翻译:最近研究表明,在过参数化的深度网络中,通过对测试误差进行双下降曲线的刻画,我们可以同时获得对嘈杂数据的插值能力和良好的泛化性能。多项式回归的常见预期是,过参数化的网络可以对嘈杂数据进行尖锐插值,同时不会过度偏离真实信号,从而保持泛化能力。目前,缺乏关于深度网络插值和泛化关系的精确定量描述。本研究通过系统地增加模型参数和训练轮数,研究在纯净标签和嘈杂标签的训练样本周围的局部输入空间内,误差函数关于输入变量的损失景观。我们的发现表明,在输入空间内的损失锐度随着模型和训练轮数呈现双下降曲线,在噪声标签周围观察到更糟糕的锐度峰。虽然小型插值模型对于清晰和嘈杂的数据都能进行精细的拟合,但大型插值模型表达出平滑的损失景观,在训练数据点周围预测噪声目标,与现有认识不同。