Recent work suggests that convolutional neural networks of different architectures learn to classify images in the same order. To understand this phenomenon, we revisit the over-parametrized deep linear network model. Our asymptotic analysis, assuming that the hidden layers are wide enough, reveals that the convergence rate of this model's parameters is exponentially faster along directions corresponding to the larger principal components of the data, at a rate governed by the singular values. We term this convergence pattern the Principal Components bias (PC-bias). We show how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently at earlier stages of learning. We then compare our results to the spectral bias, showing that both biases can be seen independently, and affect the order of learning in different ways. Finally, we discuss how the PC-bias may explain some benefits of early stopping and its connection to PCA, and why deep networks converge more slowly when given random labels.
翻译:最近的工作表明,不同结构的进化神经网络学会以同样的顺序对图像进行分类。为了理解这一现象,我们重新审视了过度平衡的深线网络模型。我们假设隐藏的层足够宽,则假设隐藏的层层足够宽,我们无药可依的分析显示,该模型参数的趋同率在与数据中较大主要组成部分相对应的方向上呈指数加速速度,其速度由单一值决定。我们用这种趋同模式来形容主要组成部分偏向(PC-bias) 。我们展示了PC-bis如何简化线性和非线性网络的学习顺序,在早期学习阶段更为突出。我们然后将我们的结果与光谱偏差进行比较,显示两种偏差都可以独立地看到,并且以不同的方式影响学习的顺序。最后,我们讨论了PC-biraks如何解释早期停止及其与五氯苯连接的一些好处,以及当给定随机标签时,深网络为何会更慢地聚集。