We study the implicit bias of stochastic gradient descent to favor low-depth solutions when training deep neural networks. Recent results in the literature suggest that penultimate layer representations learned by a classifier over multiple classes exhibit a clustering property, called neural collapse. First, we empirically show that neural collapse generally strengthens when increasing the number of layers. In addition, we demonstrate that neural collapse extends beyond the penultimate layer and emerges in intermediate layers as well, making the higher layers essentially redundant. We characterize a notion of effective depth which measures the minimal layer that enjoys neural collapse. In this regard, we hypothesize and empirically show that gradient descent implicitly selects neural networks of small effective depths. Finally, we theoretically and empirically show that the effective depth of a trained neural network monotonically increases when training with extended portions of random labels and connecting it with generalization.
翻译:我们研究的是深神经网络培训中隐含的随机梯度下降偏向偏向于低深度解决方案的偏向。最近文献中的结果表明,一个分类者在多类上的倒数第二层展示出一个群状属性,称为神经崩溃。首先,我们从经验上表明,当增加层数时,神经崩溃一般会加剧。此外,我们证明神经崩溃超越倒数第二层,在中间层也会出现,使较高层基本上变得多余。我们给出了一个有效深度概念,以测量神经崩溃的最小层。在这方面,我们虚度和实验性地表明,梯度下降隐含地选择了小有效深度的神经网络。最后,我们从理论上和实验上表明,经过训练的神经网络的有效深度在培训时会单元化地增加,而通过随机标签的延伸部分与一般化联系起来。