Benign, Tempered, 或灾难: 过度适应的分类学 (Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting)

The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied benign overfitting, a phenomenon where some interpolating methods approach Bayes optimality, even in the presence of noise. In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime. We call this intermediate regime tempered overfitting, and we initiate its systematic study. We first explore this phenomenon in the context of kernel (ridge) regression (KR) by obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors. We find that kernels with powerlaw spectra, including Laplace kernels and ReLU neural tangent kernels, exhibit tempered overfitting. We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning.

翻译：过度依赖的神经网络的实际成功促使了最近对内插方法的科学研究,这些方法完全适合其培训数据。某些内插方法,包括神经网络,在无视统计学理论的标准直觉的情况下,可以安装噪音的训练数据,而不必进行灾难性的测试性能差,无视统计学理论的标准直觉。为了解释这一点,最近一大批工作研究了良性过大的问题,即一些内插方法接近贝都因最佳性能的现象,即便在噪音的情况下也是如此。在这项工作中,我们争辩说,虽然对内插的调整是富有启发性和成果的,但许多真正的内插方法,如神经网络却不适得其次:在训练时,包括神经网络的适度噪音导致非零(但非无限)过度的试验性工作,意味着这些模型既不是良性的,也不是灾难性的,而是在中间体制中掉下来的。我们称这个中间制度是温和的,我们开始系统的研究。我们首先从内核(脊)回归(KR)的角度来探讨这种现象,通过获得关于脊部参数和内核线透透视系统对内核网络的每个经过训练的正的内核的内核研究,我们发现这些内核的内核的内核的内核的内核研究,我们发现,这些内核的内核的内核的内核的内核,我们发现一个更的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核的内核,我们发现更是更的内核的内核的内核的内核的内核。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日