The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Recent works demonstrated that deep nets trained with mean squared error (MSE) loss perform comparably to those trained with CE. As a preliminary, we empirically establish that NC emerges in such MSE-trained deep nets as well through experiments on three canonical networks and five benchmark datasets. We provide, in a Google Colab notebook, PyTorch code for reproducing MSE-NC and CE-NC: at https://colab.research.google.com/github/neuralcollapse/neuralcollapse/blob/main/neuralcollapse.ipynb. The analytically-tractable MSE loss offers more mathematical opportunities than the hard-to-analyze CE loss, inspiring us to leverage MSE loss towards the theoretical investigation of NC. We develop three main contributions: (I) We show a new decomposition of the MSE loss into (A) terms directly interpretable through the lens of NC and which assume the last-layer classifier is exactly the least-squares classifier; and (B) a term capturing the deviation from this least-squares classifier. (II) We exhibit experiments on canonical datasets and networks demonstrating that term-(B) is negligible during training. This motivates us to introduce a new theoretical construct: the central path, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. (III) By studying renormalized gradient flow along the central path, we derive exact dynamics that predict NC.
翻译:最新发现的神经折叠( NC) 现象在今天的深度净培训范式中普遍发生。 将跨物种( CE) 损失推向零。 在 NC 期间, 最后一层特征会滑落到他们的阶级对象, 分类器和阶级对象单位会崩溃到相同的简单度角光框架, 分类器行为会崩溃到最近的级别决定规则 。 最近的工作表明, 受过平均正方位错误( MSE) 损失训练的深度网与在 CE 培训的深度( MSE) 。 作为初步, 我们通过实验确定NCE 训练的深度网会出现在这种经过MSE 训练的深层网中。 在Google Colab笔记中, 分类器代码会崩溃, 分类器行为崩溃到最近的平方位上, 直线( 直线/ blob/ malco / mineurco 折叠 ) 。 分析性的MSEE损失会给我们带来更多的数学机会, 直线会显示我们最核心的内变的内变。