Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterised regime. Here we use a neural network Gaussian process (NNGP) which maps exactly to a fully connected network (FCN) in the infinite width limit, combined with techniques from random matrix theory, to calculate this generalisation behaviour, with a particular focus on the overparameterised regime. We verify our predictions with numerical simulations of the corresponding Gaussian process regressions. An advantage of our NNGP approach is that the analytical calculations are easier to interpret. We argue that neural network generalization performance improves in the overparameterised regime precisely because that is where they converge to their equivalent Gaussian process.
翻译:神经网络中的双光曲线描述了一种现象,即超光速错误最初随着参数的增加而下降,然后在达到比数据点数量更少的最优参数数目之后增长,然后又在超度分解机制中再次下降。在这里,我们使用神经网络Gaussian进程(NNGP),该过程将完全与完全连接的无限宽限网络(FCN)相匹配,同时使用随机矩阵理论的技术来计算这种一般行为,特别侧重于超度分解机制。我们用相应的高斯进程回归的数值模拟来核查我们的预测。我们的NNGP方法的一个优点是分析计算更容易解释。我们争辩说,超宽度分解机制中的神经网络一般化性能会得到改善,正因为这正是它们与相应的高斯进程汇合的地方。