For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If covariates are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present the spiked covariates model that can capture in a unified framework both behaviors observed in earlier work. We hypothesize that such a latent low-dimensional structure is present in image classification. We test numerically this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.
翻译:对于偏差梯度下降(SGD)的初始化规模而言,宽度神经网络(NN)被复制内核Hilbert空间(RKHS)方法所显示的非常接近。最近的实证工作表明,对于某些分类任务,RKHS方法可以取代NS, 而不会在性能方面蒙受巨大损失。另一方面,已知两个层次的NNNS可以将比RKHS更富的光滑等级编码起来,而我们知道一些特殊的例子,SGD训练的NNN(NN)可以明显地超过RKHS。即使在宽度的网络限制下,对于硬度初始化的不同规模也是如此。我们如何调和上述主张调和要求调?对于哪些任务比RKHSHS差?如果共变差值几乎不差,RKHS方法则可以取代NNS, 而NN可以通过学习最低的维度代表来克服它。在这里,我们在这里展示的低维度结构的低维度结构是相同的低维度结构,我们更精确地描述这种低度结构。我们目前所观察到的正值的底值结构。