Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep neural networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. In this paper, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also includes infinitely overparameterized neural networks trained with gradient descent. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel or data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with "simple functions", which are identified by solving a kernel eigenfunction problem on the data distribution. This notion of simplicity allows us to characterize whether a kernel is compatible with a learning task, facilitating good generalization performance from a small number of training examples. We show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks. To further understand these phenomena, we turn to the broad class of rotation invariant kernels, which is relevant to training deep neural networks in the infinite-width limit, and present a detailed mathematical analysis of them when data is drawn from a spherically symmetric distribution and the number of input dimensions is large.
翻译:超越培训数据集的概括化是机器学习的一个主要目标,但是对一般化的理论理解对于许多模型来说仍然是一个开放的问题。由于在深神经网络中最近观测到的理论应用到真实的和合成的详细的数据集,以及许多内核,其中包括在无限宽限内对深神经网络的培训中产生的功能,因此,我们调查内核回归的概括化错误,这除了是一种流行的机器学习方法之外,还包括了受过梯度下降训练的无限超度的神经网络。我们从统计机械学中得出的技术,为任何内核或数据分布都无法对一般化错误进行分析。我们把理论应用到真实的和合成的详细的网络中,以及许多内核网络的运用,包括在无限宽度内核网络培训中产生的功能。我们用“简单功能”来解释内核回归的内在偏移偏移偏差,通过解决数据分布过程中的内核软功能问题来查明。这种简单化概念使我们得以辨别内核与任何深度的学习任务是否相容性,便利我们从一个小的概括化表现到一个细的内核变的内核分析,我们从一个细的内核的内核分析过程到一个没有多少的内核的内核分析。我们用的内核的内核的内核分析,我们用的内核分析会用的内核学的变作到一个不至一个不甚甚甚高的内核的内核的变。我们用的变的变的变的变的变。我们显示的内核的变的变的内核的变。