We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. We offer an explanation for this phenomenon based on the concept of class-features variability collapse, which refers to the training dynamics of deep classification networks where the feature embeddings of samples belonging to the same class tend to concentrate around their class means. More specifically, we examine the few-shot error of the learned feature map, which is the classification error of the nearest class-center classifier using centers learned from a small number of random samples from each class. Assuming that the classes appearing in the data are selected independently from a distribution, we show that the few-shot error generalizes from the training data to unseen test data, and we provide an upper bound on the expected few-shot error for new classes (selected from the same distribution) using the average few-shot error for the source classes. Additionally, we show that the few-shot error on the training data can be upper bounded using the degree of class-features variability collapse. This suggests that foundation models can provide feature maps that are transferable to new downstream tasks even with limited data available.
翻译:我们研究基础模型的能力,以了解可转移到新的、看不见的类别进行分类的方法。最近的文献结果显示,许多类的单一分类人员所学的表示方式在通过为这些问题设计的特别用途算法所学的表达方式所学的微小学习问题上具有竞争力。我们对这种现象的解释是基于类地貌变化崩溃的概念,它是指深层次分类网络的培训动态,属于同一类的样品的特征嵌入特征往往集中在其类内。更具体地说,我们研究学习的特征图的微小错误,即使用从每一类的少量随机样本中学习的中心,最接近的分类分类器的分类错误。假设数据中显示的分类是独立于分布的,我们显示从培训数据到隐性测试数据的粗略错误,我们用源类中平均的微错误,对新类别(从同一分布中选择的样品)的预期微错误,提供了上限。此外,我们显示,即使培训数据中的微误差,也能够提供可移动的下游模型。