Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, which leads to poor generalization to unseen concepts. Towards non-spurious and efficient prompt learning from limited examples, this paper presents a novel \underline{\textbf{C}}ounterfactual \underline{\textbf{P}}rompt \underline{\textbf{L}}earning (CPL) method for vision and language models, which simultaneously employs counterfactual generation and contrastive learning in a joint optimization framework. Particularly, CPL constructs counterfactual by identifying minimal non-spurious feature change between semantically-similar positive and negative samples that causes concept change, and learns more generalizable prompt representation from both factual and counterfactual examples via contrastive learning. Extensive experiments demonstrate that CPL can obtain superior few-shot performance on different vision and language tasks than previous prompt tuning methods on CLIP. On image classification, we achieve 3.55\% average relative improvement on unseen classes across seven datasets; on image-text retrieval and visual question answering, we gain up to 4.09\% and 25.08\% relative improvements across three few-shot scenarios on unseen test sets respectively.
翻译:快速调整是一种新的微小的传授学习技术,它只能调和为诸如CLIP等经过预先训练的视觉和语言模型所学到的快速方法。然而,现有的快速调试方法往往会学习虚假或纠缠的表达方式,导致对隐蔽概念的概括化不善,导致对隐蔽概念的简单化和高效的迅速学习。本文展示了一种新型的以下方法: 底线 {textbf{C ⁇ ountfactal \ sunderline_textleb{P ⁇ romptt \ sunderline_textbf{L ⁇ national (CPL) 方法,该方法同时在联合优化框架内采用反事实生成和对比性学习。 特别是,CPL 构建反现实化模型,通过确定最小的非净化性特征变化,导致概念变化,以及通过对比性学习从事实和反事实性实例中更普遍化的表达方式。 广泛的实验表明,CPLIP 可以在不同的视觉和语言任务上取得优于以往快速调整方法的优异性生成和对比性图像升级,我们分别在图象学上取得了3.55 和图象学测试3- br___xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx