Out-of-distribution (OOD) generalization, where the model needs to handle distribution shifts from training, is a major challenge of machine learning. Recently, contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, revealing a promising path toward OOD generalization. However, to boost upon zero-shot performance, further adaptation of CLIP on downstream tasks is indispensable but undesirably degrades OOD generalization ability. In this paper, we aim at generalizing CLIP to out-of-distribution test data on downstream tasks. Beyond the two canonical OOD situations, domain shift and open class, we tackle a more general but difficult in-the-wild setting where both OOD situations may occur on the unseen test data. We propose CLIPood, a simple fine-tuning method that can adapt CLIP models to all OOD situations. To exploit semantic relations between classes from the text modality, CLIPood introduces a new training objective, margin metric softmax (MMS), with class adaptive margins for fine-tuning. Moreover, to incorporate both the pre-trained zero-shot model and the fine-tuned task-adaptive model, CLIPood proposes a new Beta moving average (BMA) to maintain a temporal ensemble according to Beta distribution. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
翻译:分配(OOD) 常规化是机器学习的一大挑战。 最近, 对比式语言图像培训前培训前(CLIP) 模型展示了令人印象深刻的零射能力, 展示了OOD 常规化的一条大路。 然而, 在零射性能上, 进一步调整 CLIP 以适应下游任务是不可或缺的, 但却不尽人意地降低 OOOD 常规化能力。 在本文件中, 我们的目标是将 CLIP 推广到下游任务的分配测试数据之外。 除了两种卡通式 OOOD 情况、 域变换 和 开放类之外, 我们处理一个更一般但困难的场景, 展示出OODD 常规化( CLIP), 一种简单的微调方法, 使 CLIP 模式适应所有 OOOD 的下游情况。 为了利用文本模式中的各个类之间的语义性关系, CLIP 引入一个新的培训目标, 标准软体化软体化(MMS ),, 以及 级调整的适应性模型。 此外, 将CADBT- Bread- Breal- Breadal- BS 格式化的模型, 演示式的模型, 演示式的C- Breal- Brealmodud-late- Breal-late-late-ld-ld- LTO- ta- ta- ta- d- ta- ta- ta- ta- ta- ta- d- d- tad- d- d- d- tad- tad- d- tad- tad- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta- ta-d-d- ta- ta- ta- ta-ld-d- ta-ld- ta-l-l-ld-d-l-ld-ld-l-l-l-l-l-l-l-l-l-l-l-ld-l-l-l-l-l-l-l-l-ld-d-