Neural collapse is a highly symmetric geometric pattern of neural networks that emerges during the terminal phase of training, with profound implications on the generalization performance and robustness of the trained networks. To understand how the last-layer features and classifiers exhibit this recently discovered implicit bias, in this paper, we introduce a surrogate model called the unconstrained layer-peeled model (ULPM). We prove that gradient flow on this model converges to critical points of a minimum-norm separation problem exhibiting neural collapse in its global minimizer. Moreover, we show that the ULPM with the cross-entropy loss has a benign global landscape for its loss function, which allows us to prove that all the critical points are strict saddle points except the global minimizers that exhibit the neural collapse phenomenon. Empirically, we show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used.
翻译:神经崩溃是一种高度对称的神经网络的几何模式,这种模式在培训的最后阶段出现,对受过训练的网络的普及性表现和稳健性产生深远影响。为了理解末层特征和分类人员如何展示最近发现的隐含偏差,我们在本文件中采用了一种代金模型,称为不受限制的层状型模型(ULPM ) 。我们证明,这一模型上的梯度流会汇合到一个最小中温分解问题的临界点,显示其全球最小化。此外,我们显示,具有跨天体损耗的ULPM具有良好的全球景观,可以使其损失功能,这使我们能够证明所有临界点都是严格的支撑点,但展示神经崩溃现象的全球最小化者除外。我们经常地表明,在不使用明确的规范化或重量衰减的情况下,我们的结果在对现实世界的神经网络进行培训时,也会维持在现实世界的任务中。