With the ever-growing model size and the limited availability of labeled training data, transfer learning has become an increasingly popular approach in many science and engineering domains. For classification problems, this work delves into the mystery of transfer learning through an intriguing phenomenon termed neural collapse (NC), where the last-layer features and classifiers of learned deep networks satisfy: (i) the within-class variability of the features collapses to zero, and (ii) the between-class feature means are maximally and equally separated. Through the lens of NC, our findings for transfer learning are the following: (i) when pre-training models, preventing intra-class variability collapse (to a certain extent) better preserves the intrinsic structures of the input data, so that it leads to better model transferability; (ii) when fine-tuning models on downstream tasks, obtaining features with more NC on downstream data results in better test accuracy on the given task. The above results not only demystify many widely used heuristics in model pre-training (e.g., data augmentation, projection head, self-supervised learning), but also leads to more efficient and principled fine-tuning method on downstream tasks that we demonstrate through extensive experimental results.
翻译:随着模型规模的不断扩大和标签培训数据的有限提供,转让学习在许多科学和工程领域已成为日益流行的方法。关于分类问题,这项工作深入到通过一种令人感兴趣的现象,即神经崩溃(NC)转移学习的奥秘,在这种令人感兴趣的现象中,学习深层次网络的最后一层特征和分类者能够满足:(一) 特征的分类内变异性向零下降,以及(二) 阶级间特征手段在最大程度上和平等分离。从NC的角度来看,我们的转让学习结果如下:(一) 当培训前模式防止(在某种程度上)类内变异性崩溃时,更好地保存输入数据的内在结构,从而导致更好的模式可转移性;(二) 当对下游任务进行微调模型时,在下游数据中取得更多NC的特征,从而更好地测试任务是否准确性。上述结果不仅消除了模型培训前许多广泛使用的超自然学的神秘性(例如数据增强、投影头、自我校准学习),而且还导致通过广泛的下游方法展示我们如何广泛进行试验的结果。