Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model for unseen domains. Recent foundation models (FMs) are robust to many distribution shifts and, therefore, should substantially improve the performance of DG. In this work, we study generic ways to adopt CLIP, a Visual-Language Foundation Model, for DG problems in image classification. While ERM greatly improves the accuracy with bigger backbones and training datasets using standard DG benchmarks, fine-tuning FMs is not practical in many real-world situations. We propose Domain Prompt Learning (DPL) as a novel approach for domain inference in the form of conditional prompt generation. DPL achieved a significant accuracy improvement with only training a lightweight prompt generator (a three-layer MLP), whose parameter is of equivalent scale to the classification projector in the previous DG literature. Combining \dplshort~with CLIP provides surprising performance, raising the accuracy of zero-shot CLIP from 73.7% to 79.3% on several standard datasets, namely PACS, VLCS, OfficeHome, and TerraIncognita. We hope the simplicity and success of our approach lead to broader adoption and analysis of foundation models in the domain generalization field. Our code is available at https://github.com/shogi880/DPLCLIP.
翻译:广域化( DG) 是一个困难的转移学习问题, 目的是学习一个普通的无形域模式。 最近的基础模型( FMs) 是针对许多分布变化的, 因此应该大大改善DG的绩效。 在这项工作中, 我们研究采用通用方法, 采用CLIP(视觉语言基础模型), 解决图像分类中的 DG 问题。 虽然机构化使用标准DG 基准, 大大提高了使用更大的主干线和培训数据集的准确性, 微调调调在许多现实世界情况中并不实用。 我们提议Dome快速学习(DPL) 是一种创新的方法, 以有条件的快速生成为形式进行域推论。 DPLLC 取得了显著的准确性改进, 仅培训了轻量级的快速生成器( 3级 MLP), 其参数相当于GDG文献中的分类投影机。 将\ dplshort~ 与 CLIP 组合提供惊人的性能, 将零点 CLIP 的准确率从73. 提高到79. 3% 几个标准数据集, 即 PACS、 VLCSLCS、 OfferHHHHHHHH 和Teralimpalimpal 和TLOL.