Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse ($\mathcal{NC}$). Recent papers have theoretically shown that $\mathcal{NC}$ emerges in the global minimizers of training problems with the simplified ``unconstrained feature model''. In this context, we take a step further and prove the $\mathcal{NC}$ occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit $\mathcal{NC}$ properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of $\mathcal{NC}$ under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios.
翻译:现代深度神经网络在图像分类到自然语言处理等任务中取得了令人瞩目的性能。令人惊讶的是,这些具有大量参数的复杂系统在训练收敛时,在它们最后一层的特征和分类器中展现相同的结构特性。特别地,已经观察到,最后一层特征会崩溃(converge)到其类均值上,并且那些类均值是等角紧框(Equianglar Tight Frame)的顶点。这种现象被称为神经崩溃 ($\mathcal{NC}$) 。最近的论文已经理论证明,$\mathcal{NC}$ 在简化的“无约束特征模型”中的训练问题的全局最小化解中出现。在这个背景下,我们更进一步地证明了 $\mathcal{NC}$ 出现在 MSE和CE 损失的深度线性网络中,展示全局解在线性层中展现 $\mathcal{NC}$ 特性。此外,我们将我们的研究扩展到了 MSE 损失下不平衡数据,并在无偏设置下对 $\mathcal{NC}$ 进行了首次几何分析。我们的结果展示最后一层特征和分类器收敛到一组正交向量组成的几何形状上,其长度取决于其对应类别的数据量。最后,我们在平衡和不平衡情况下对合成和实际网络架构进行了理论分析的实证验证。