Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve test error close to the Bayes-optimal error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
翻译:位于偏僻处的超光层,即那些在数据吵闹的情况下对模型进行全面综合的内插模型,首先在经过梯度下降训练的神经网络模型中观察到。为了更好地了解这一经验性观察,我们考虑了在随机初始化后,通过梯度下降对后勤损失进行内插而受过训练的两层神经网络的普遍误差。我们假设数据来自分离的单级有条件对线和内核的分布,并允许对手不断腐蚀培训标签的一部分。我们表明,在这个环境中,神经网络表现出无害的过度:它们可以被驱动到零训练错误,完全适合任何吵闹的培训标签,同时在靠近贝亚最佳误差的地方实现试验错误。 与以往关于需要线性或内核预测器的良性过大的工作相比,我们的分析处于一个模型和学习动态基本非线性的环境中。