We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset.
翻译:我们用高山过程和统计物理学的理论方法,从培训样本的数量中得出内核回归一般化表现的分析表达方式,作为使用培训样本数量的函数。我们的表达方式适用于广泛的神经网络,因为培训与内核核心(NTK)的内核回归等同。通过计算内核不同光谱组成部分造成的全共化误差的分解,我们确定了一种新的光谱原则:随着训练数据集的大小增长,内核机器和神经网络与目标函数相接较高的光谱模式相匹配。当数据从高分辨率超球统一分布中取样时,包括NTK在内的圆形产品内核,在了解目标函数不同频率模式的展览阶段学习。我们用合成数据和MNIST数据集的模拟来核查我们的理论。