We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. In this paper we provide an explanation for this behavior based on the recently observed phenomenon that the features learned by overparameterized classification networks show an interesting clustering property, called neural collapse. We demonstrate both theoretically and empirically that neural collapse generalizes to new samples from the training classes, and -- more importantly -- to new classes as well, allowing foundation models to provide feature maps that work well in transfer learning and, specifically, in the few-shot setting.
翻译:我们研究基础模型的能力,以了解可转用于新的、看不见的类别分类的表述。最近文献中的结果显示,一个单一分类者在许多类别上所学的表述在通过专门算法为这类问题所学的表述的微小学习问题上具有竞争力。在本文件中,我们根据最近观察到的现象对这一行为的解释,即过度分解分类网络所学的特征显示出一种有趣的组合属性,称为神经崩溃。我们在理论上和经验上都表明,神经崩溃一般地属于培训班的新样本,更重要的是,也体现在新类别上,使基础模型能够提供在转让学习方面,特别是在少数例子中行之有效的特征图。