Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the last layer is trainable, known as the random feature model. We study over-parametrization in the context of a student-teacher framework by deriving a set of differential equations for the learning dynamics. For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly, and we compute the non-zero asymptotic generalization error. Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.
翻译:深度神经网络是广泛使用的预测算法,随着参数数量的增加,其性能通常会提高,导致过量参数化。我们考虑一个两层神经网络,其第一层固定而最后一层可训练,称为随机特征模型。我们通过推导学习动态的一组微分方程来研究过量参数化在师生框架下的应用。对于任何有限的隐藏层大小与输入维度比,学生都无法完全泛化,我们计算出非零的渐近泛化误差。只有当学生的隐藏层大小指数级增加时,才有可能实现完美泛化的方法。