A recent series of theoretical works showed that the dynamics of neural networks with a certain initialisation are well-captured by kernel methods. Concurrent empirical work demonstrated that kernel methods can come close to the performance of neural networks on some image classification tasks. These results raise the question of whether neural networks only learn successfully if kernels also learn successfully, despite neural networks being more expressive. Here, we show theoretically that two-layer neural networks (2LNN) with only a few hidden neurons can beat the performance of kernel learning on a simple Gaussian mixture classification task. We study the high-dimensional limit where the number of samples is linearly proportional to the input dimension, and show that while small 2LNN achieve near-optimal performance on this task, lazy training approaches such as random features and kernel methods do not. Our analysis is based on the derivation of a closed set of equations that track the learning dynamics of the 2LNN and thus allow to extract the asymptotic performance of the network as a function of signal-to-noise ratio and other hyperparameters. We finally illustrate how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.
翻译:最近一系列的理论工作表明,具有某种初始化的神经网络的动态被内核方法很好地抓住了。同时,实证工作表明,内核方法可以接近神经网络在某些图像分类任务方面的性能。这些结果提出了一个问题,即尽管神经网络更加清晰,但内核也能够成功学习,神经网络能否成功学习。这里,我们从理论上表明,只有少数隐蔽神经人的两层神经网络(2LNNN)能够战胜内核学习在简单的高斯混合分类任务方面的性能。我们研究了样本数量与输入层面成直线成比例的高维值限制,并表明虽然小2LNNN能够取得接近最佳的性能,但是随机特性和内核方法等懒惰的培训方法却不会成功。我们的分析基于一套封闭的方程式的衍生结果,该方程式跟踪2LNN的学习动态,从而可以提取网络的无约束性性性性性性能,作为信号到神经比和其他超分辨率的功能,但我们最后可以说明如何超越网络的性能。