The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied $\textit{benign overfitting}$, a phenomenon where some interpolating methods approach Bayes optimality, even in the presence of noise. In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks $\textit{do not fit benignly}$: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime. We call this intermediate regime $\textit{tempered overfitting}$, and we initiate its systematic study. We first explore this phenomenon in the context of kernel (ridge) regression (KR) by obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors. We find that kernels with powerlaw spectra, including Laplace kernels and ReLU neural tangent kernels, exhibit tempered overfitting. We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning.
翻译:过度分解的神经网络的实际成功促使了最近对内插方法的科学研究,这些方法完全适合其培训数据。 某些内插方法,包括神经网络,可以在不违背统计学习理论的标准直觉的情况下,在不考虑统计测试性学理论的标准直观的情况下,在不考虑灾难性测试性能的情况下,安装噪音的训练数据。 为了解释这一点,一个最近的工作主体研究了美元(textit{benign overformatit}美元,这个现象是一些内插方法,甚至在出现噪音的情况下,也接近了贝内斯最佳性。在这项工作中,我们争辩说,虽然良性过度配制已经具有启发性和丰硕成果,但许多真正的内插方法,如神经网络 $\ textit{dondon et 良性网络 :在测试时, 适度的噪音导致非零(但非无限) 过度的风险。 这些模型既不是良性,也不是灾难性的,而是在中间制度下。我们称之为中间制度 $\ text{tweperferfall}, 我们开始系统研究。我们首先在内层(脊) 学习这个现象时, 在内层内层(RR) 学习中, 通过每层的内精度的内学习这些内, 我们的内积的内积的内积的内, 和内积的内积的内积的内积的内积内积的内, 研究中, 通过这些内, 的内积的内,我们所所的内积的内存的内存的内存的内存的内存的内存的内存的内存的内存, 通过这些内存, 和内存的内存的内存的内存的内存的内存的内存的内存的内存, 和内存的内存的内存的内存的内存, 的内存的内衣研究,我们所的内行的内存的内衣,我们所的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存,