Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call Neural Collapse, involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame (ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class-mean, i.e. the Nearest Class Center (NCC) decision rule. The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability.
翻译:(NC1)培训深网培训的现代做法涉及培训的结束阶段,从培训错误首先消失的时代开始; 在培训失败的时代,培训错误实际上保持零,而培训损失则推向零。 TPT直接测量了三种原型深网结构以及7个卡通分类数据集的TPT, 暴露了一种普遍的暗示偏差,我们称之为神经崩溃,涉及4个密切相互关联的现象:(NC1) 最后一层培训启动的跨级内变异性崩溃为零,因为个人激活本身会崩溃到他们的阶级; (NC2) 班级中利率向简单度角直角框架(ETF)的顶端倾斜;(NC3) 升至伸缩,最后一层分析器崩溃到阶级,或换句话说到简单 EtFTF, 即自相连接的配置;(NC4) 给定的升级决定崩溃,仅选择哪个班级最接近的班级的班级具有最接近的班级、更稳健性、 irodualcal cal decilal decilateal cal dequistrutal cal) 更好解释。