The notion of neural collapse refers to several emergent phenomena that have been empirically observed across various canonical classification problems. During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation, and the features of different classes tend to separate as much as possible. Neural collapse is often studied through a simplified model, called the unconstrained feature representation, in which the model is assumed to have "infinite expressivity" and can map each data point to any arbitrary representation. In this work, we propose a more realistic variant of the unconstrained feature representation that takes the limited expressivity of the network into account. Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse. Using a model of the memorization-dilation (M-D) phenomenon, we show one mechanism by which different losses lead to different performances of the trained network on noisy data. Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.
翻译:神经崩溃是指在各种经典分类问题中经验上观察到的几种新兴现象。在深度神经网络的训练终止阶段,同一类别所有示例的特征嵌入倾向于坍缩为单一表示,并且不同类别的特征尽可能地分离。通常通过一个简化模型——称为无约束特征表示来研究神经崩溃。在该模型中,假设模型具有“无限表现力”并且可以将每个数据点映射到任意表示。在本文中,我们提出了一个更现实的变体——考虑了网络的有限表现力的无约束特征表示。实验证据表明,记忆噪声数据点会导致神经崩溃的退化(膨胀)。使用记忆扩张(M-D)现象模型,我们展示了不同损失导致训练网络在噪音数据上表现不同的机制。我们的证明揭示了为什么标签平滑,一种经验上被观察到产生正则化效果的交叉熵修改,在分类任务中产生了改进的泛化。