As model size continues to grow and access to labeled training data remains limited, transfer learning has become a popular approach in many scientific and engineering fields. This study explores the phenomenon of neural collapse (NC) in transfer learning for classification problems, which is characterized by the last-layer features and classifiers of deep networks having zero within-class variability in features and maximally and equally separated between-class feature means. Through the lens of NC, in this work the following findings on transfer learning are discovered: (i) preventing within-class variability collapse to a certain extent during model pre-training on source data leads to better transferability, as it preserves the intrinsic structures of the input data better; (ii) obtaining features with more NC on downstream data during fine-tuning results in better test accuracy. These results provide new insight into commonly used heuristics in model pre-training, such as loss design, data augmentation, and projection heads, and lead to more efficient and principled methods for fine-tuning large pre-trained models. Compared to full model fine-tuning, our proposed fine-tuning methods achieve comparable or even better performance while reducing fine-tuning parameters by at least 70% as well as alleviating overfitting.
翻译:由于模型规模继续扩大,而且获得标签培训数据的机会仍然有限,在许多科学和工程领域,转让学习已成为一种受欢迎的方法,本研究探索了在分类问题的转让学习中神经崩溃现象,其特点是,深层次网络的最后一层特征和分类者在特性上没有阶级内变异,而且各等级特征之间差别最大和平等。从NC的角度来看,在这项工作中发现关于转让学习的下列研究结果:(一) 在源数据示范培训前,在某种程度上防止阶级内部变异性崩溃,使源数据的内在结构得到更好的保护,从而导致更好的可转让性;(二) 在微调结果提高测试准确性时,在下游数据上取得更多的NC特性,这些结果使人们对模型前培训中常用的超自然主义有了新的了解,例如损失设计、数据增强和投影头,并导致更高效和有原则地调整预先培训的大型模型。与全面模型微调相比,我们提议的微调方法取得了可比的或甚至更好的性能,同时将微调率降低至少70%和超度。</s>