One of the central features of deep learning is the generalization abilities of neural networks, which seem to improve relentlessly with over-parametrization. In this work, we investigate how properties of data impact the test error as a function of the number of training examples and number of training parameters; in other words, how the structure of data shapes the "generalization phase space". We first focus on the random features model trained in the teacher-student scenario. The synthetic input data is composed of independent blocks, which allow us to tune the saliency of low-dimensional structures and their relevance with respect to the target function. Using methods from statistical physics, we obtain an analytical expression for the train and test errors for both regression and classification tasks in the high-dimensional limit. The derivation allows us to show that noise in the labels and strong anisotropy of the input data play similar roles on the test error. Both promote an asymmetry of the phase space where increasing the number of training examples improves generalization further than increasing the number of training parameters. Our analytical insights are confirmed by numerical experiments involving fully-connected networks trained on MNIST and CIFAR10.
翻译:深层学习的核心特征之一是神经网络的普及能力,这种能力似乎随着超平衡性而不断改善。在这项工作中,我们调查数据特性如何影响测试错误,这是培训实例数目和培训参数数目的函数;换句话说,数据结构如何塑造“一般化阶段空间”。我们首先侧重于在师生假想中受过培训的随机特征模型。合成输入数据由独立区块组成,这使我们能够调和低维结构的显著性及其与目标功能的相关性。我们利用统计物理学的方法,为高维限度的回归和分类任务获得火车和测试错误的分析表达。这种推断使我们能够显示标签上的噪音和输入数据的强反射性在测试错误上起到类似的作用。两者都促进了阶段空间的不对称性,在这一阶段,增加培训实例的数量比增加培训参数的数量进一步改进了一般化。我们的分析见解得到了由在MNIST和CIFAR10上受过培训的完全连接的网络进行的数字实验的证实。