Recent work [Papyan, Han, and Donoho, 2020] discovered a phenomenon called Neural Collapse (NC) that occurs pervasively in today's deep net training paradigm of driving cross-entropy loss towards zero. In this phenomenon, the last-layer features collapse to their class-means, both the classifiers and class-means collapse to the same Simplex Equiangular Tight Frame (ETF), and the behavior of the last-layer classifier converges to that of the nearest-class-mean decision rule. Since then, follow-ups-such as Mixon et al. [2020] and Poggio and Liao [2020a,b]-formally analyzed this inductive bias by replacing the hard-to-study cross-entropy by the more tractable mean squared error (MSE) loss. But, these works stopped short of demonstrating the empirical reality of MSE-NC on benchmark datasets and canonical networks-as had been done in Papyan, Han, and Donoho [2020] for the cross-entropy loss. In this work, we establish the empirical reality of MSE-NC by reporting experimental observations for three prototypical networks and five canonical datasets with code for reproducing NC. Following this, we develop three main contributions inspired by MSE-NC. Firstly, we show a new theoretical decomposition of the MSE loss into (A) a term assuming the last-layer classifier is exactly the least-squares or Webb and Lowe [1990] classifier and (B) a term capturing the deviation from this least-squares classifier. Secondly, we exhibit experiments on canonical datasets and networks demonstrating that, during training, term-(B) is negligible. This motivates a new theoretical construct: the central path, where the linear classifier stays MSE-optimal-for the given feature activations-throughout the dynamics. Finally, through our study of continually renormalized gradient flow along the central path, we produce closed-form dynamics that predict full Neural Collapse in an unconstrained features model.
翻译:最近的工作[Papyan, Han, and Donoho, 2020] 发现了一个名为 Neal Clofer (NC) 的现象,这种现象在今天将交叉热带损失推向零的深层净培训模式中普遍存在。 在这种现象中, 末层特征会滑落到他们的班级模式中, 分类器和班级标志会倒塌到相同的Siplexx Qearforight Tight Frameral( ETF), 末层分类器的行为会接近于最接近级级的梯度决定规则。 从那时起, 诸如 Mixon et al. [202020] 和 Pogggio 和 Liao [202020a,b] 的后期追踪, 将这种感官倾向性偏差通过更难研究的跨级模式( MSE) 的跨级模式( MI- NC) 数据流流流流化( 和直线性网络( Papyan, Han, Donho- el- el- el- etal- et- demotional- demotional- demotional- demodeal) evation) 演示。