Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterised regime. Here we use a neural network Gaussian process (NNGP) which maps exactly to a fully connected network (FCN) in the infinite width limit, combined with techniques from random matrix theory, to calculate this generalisation behaviour, with a particular focus on the overparameterised regime. An advantage of our NNGP approach is that the analytical calculations are easier to interpret. We argue that neural network generalization performance improves in the overparameterised regime precisely because that is where they converge to their equivalent Gaussian process.
翻译:神经网络中的双白曲线描述了一种现象,即一般错误最初随着参数的增加而下降,然后在达到比数据点数量少的最优参数数目之后增长,然后又在超分化制度中再次下降。在这里,我们使用神经网络Gaussian进程(NNGP),该进程将完全绘制成宽度限制下完全连接的网络,同时使用随机矩阵理论的技术来计算这种一般行为,特别侧重于过度分化的系统。我们的NGP方法的一个优点是分析计算方法更容易解释。我们说,超分化制度中的神经网络一般化性能会改善,正因为正是在那里,神经网络的通用性能会与同等的高斯进程汇合。