When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's classifier. As feature normalization in the last layer becomes a common practice in modern representation learning, in this work we theoretically justify the neural collapse phenomenon for normalized features. Based on an unconstrained feature model, we simplify the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classifiers over the sphere. In this context, we analyze the nonconvex landscape of the Riemannian optimization problem over the product of spheres, showing a benign global landscape in the sense that the only global minimizers are the neural collapse solutions while all other critical points are strict saddles with negative curvature. Experimental results on practical deep networks corroborate our theory and demonstrate that better representations can be learned faster via feature normalization.
翻译:当培训过分的深度分类任务深层网络时,人们广泛观察到,所学的特征表现出所谓的“神经崩溃”现象。更具体地说,对于倒数第二层的产出特征,每类的内层特征与其手段汇合,而不同类的手段则呈现某种紧凑的框架结构,这也与最后一层的分类器相一致。随着上层的特征正常化成为现代代表性学习的一个常见做法,在这项工作中,我们理论上证明神经崩溃现象是正常特征的典型。基于一个不受限制的特点模型,我们通过限制所有特征和分层,将多级分类任务中的经验性损失函数简化为里曼尼亚多层的非康克斯优化问题。在这方面,我们分析了里曼最优化问题的非康韦克斯景观,从一种良好的全球景观中可以看出,只有全球最小化者才是神经崩溃的解决方案,而所有其他关键点都是严格的负曲线。关于实际深层网络的实验结果证实了我们的理论,并表明,可以通过更快速的正规化特征来学习更好的表现。