Recent work suggests that convolutional neural networks of different architectures learn to classify images in the same order. To understand this phenomenon, we revisit the over-parametrized deep linear network model. Our analysis reveals that, when the hidden layers are wide enough, the convergence rate of this model's parameters is exponentially faster along the directions of the larger principal components of the data, at a rate governed by the corresponding singular values. We term this convergence pattern the Principal Components bias (PC-bias). Empirically, we show how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently at earlier stages of learning. We then compare our results to the simplicity bias, showing that both biases can be seen independently, and affect the order of learning in different ways. Finally, we discuss how the PC-bias may explain some benefits of early stopping and its connection to PCA, and why deep networks converge more slowly with random labels.
翻译:最近的工作表明,不同结构的进化神经网络学会以同样的顺序对图像进行分类。为了理解这一现象,我们重新审视了过度平衡的深线网络模型。我们的分析表明,当隐藏层足够宽的时候,该模型参数的趋同率沿着数据中较大主要组成部分的方向,以相应的单数值所支配的比率,迅速增长。我们把这种趋同模式称为主要组成部分偏向(PC-bias) 。生动地说,我们展示PC-biros如何简化线性和非线性网络的学习顺序,在早期学习阶段更为突出。我们然后将我们的结果与简单偏差进行比较,表明两种偏差都可以独立地看到,并以不同的方式影响学习的顺序。最后,我们讨论了PC-biraks如何解释早期停止及其与五氯苯的连接的一些好处,以及为什么深层网络与随机标签更加缓慢地融合。