Pre-trained Vision-Language Models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been recently proposed to learn continuous prompts using task-specific training data. Despite the performance improvements on downstream tasks, several studies have reported that CoOp suffers from the overfitting issue in two aspects: (i) the test accuracy on base classes first gets better and then gets worse during training; (ii) the test accuracy on novel classes keeps decreasing. However, none of the existing studies can understand and mitigate such overfitting problem effectively. In this paper, we first explore the cause of overfitting by analyzing the gradient flow. Comparative experiments reveal that CoOp favors generalizable and spurious features in the early and later training stages respectively, leading to the non-overfitting and overfitting phenomenon. Given those observations, we propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process, and successfully eliminate the overfitting problem. Besides, we equip CoOp with Novel Feature Learner (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set, needless of image training data. Extensive experiments on 11 classification datasets demonstrate that SubPT+NFL consistently boost the performance of CoOp and outperform the state-of-the-art approach CoCoOp. Experiments on more challenging vision downstream tasks including open-vocabulary object detection and zero-shot semantic segmentation also verify the effectiveness of the proposed method. Codes can be found at https://tinyurl.com/mpe64f89.
翻译:受过事先训练的视觉语言模型(VLM),如CLIP等,在下游视觉任务中表现出了令人印象深刻的概括能力,并有适当的文字提示。最近提议使用特定任务的培训数据来学习连续的提示。尽管下游任务的业绩有所改进,但有几项研究报告说,COP在两个方面有过分适应的问题:(一) 基础班的测试精度首先提高,然后在培训期间变得更差;(二) 创新班的测试精度不断下降。然而,现有的升级研究中没有一项能够有效地理解和减轻这种过度适应问题。在本文件中,我们首先通过分析梯度流来探索过度适应的原因。比较实验表明,CoOp在早期和以后的培训阶段都倾向于普遍和令人毛骨悚然的特点,导致不适应和过度适应现象。根据这些观察,我们建议Subspace Scert-Claim Tural Turning (SubPT) 在低空的子空间里程中预测进度梯度梯度的梯度,在早期的梯度轨道变精度测试过程中,我们更能化了Olevoral-cal romodeal dal rodeal dal dreal dreal dremoudal dreal drecude drecudu,在常规训练过程中可以展示我们更能提升的轨道变的轨道变。