Benign, Tempered, 或灾难: 过度适应的分类学 (Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting)

The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied $\textit{benign overfitting}$, a phenomenon where some interpolating methods approach Bayes optimality, even in the presence of noise. In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks $\textit{do not fit benignly}$: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime. We call this intermediate regime $\textit{tempered overfitting}$, and we initiate its systematic study. We first explore this phenomenon in the context of kernel (ridge) regression (KR) by obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors. We find that kernels with powerlaw spectra, including Laplace kernels and ReLU neural tangent kernels, exhibit tempered overfitting. We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning.

翻译：过度分解的神经网络的实际成功促使了最近对内插方法的科学研究,这些方法完全适合其培训数据。某些内插方法,包括神经网络,可以在不违背统计学习理论的标准直觉的情况下,在不考虑统计测试性学理论的标准直观的情况下,在不考虑灾难性测试性能的情况下,安装噪音的训练数据。为了解释这一点,一个最近的工作主体研究了美元(textit{benign overformatit}美元,这个现象是一些内插方法,甚至在出现噪音的情况下,也接近了贝内斯最佳性。在这项工作中,我们争辩说,虽然良性过度配制已经具有启发性和丰硕成果,但许多真正的内插方法,如神经网络 $\ textit{dondon et 良性网络 :在测试时, 适度的噪音导致非零(但非无限) 过度的风险。这些模型既不是良性,也不是灾难性的,而是在中间制度下。我们称之为中间制度 $\ text{tweperferfall}, 我们开始系统研究。我们首先在内层(脊) 学习这个现象时, 在内层内层(RR) 学习中, 通过每层的内精度的内学习这些内, 我们的内积的内积的内积的内, 和内积的内积的内积的内积的内积内积的内, 研究中, 通过这些内, 的内积的内,我们所所的内积的内存的内存的内存的内存的内存的内存的内存的内存的内存, 通过这些内存, 和内存的内存的内存的内存的内存的内存的内存的内存的内存, 和内存的内存的内存的内存的内存, 的内存的内衣研究,我们所的内行的内存的内衣,我们所的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存的内存,

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日