Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been recently proposed to learn continuous prompts using taskspecific training data. Despite the performance improvements on downstream tasks, several studies have reported that CoOp suffers from the overfitting issue in two aspects: (i) the test accuracy on base classes first improves and then worsens during training;(ii) the test accuracy on novel classes keeps decreasing. However, none of the existing studies can understand and mitigate such overfitting problems. In this study, we first explore the cause of overfitting by analyzing the gradient flow. Comparative experiments reveal that CoOp favors generalizable and spurious features in the early and later training stages, respectively, leading to the non-overfitting and overfitting phenomena. Given those observations, we propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process and successfully eliminate the overfitting problem. In addition, we equip CoOp with a Novel Feature Learner (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set, needless of image training data. Extensive experiments on 11 classification datasets demonstrate that SubPT+NFL consistently boost the performance of CoOp and outperform the state-of-the-art CoCoOp approach. Experiments on more challenging vision downstream tasks, including open-vocabulary object detection and zero-shot semantic segmentation, also verify the effectiveness of the proposed method. Codes can be found at https://tinyurl.com/mpe64f89.
翻译:诸如 CLIP 等预设的视觉语言模型( VLMS) 显示在下游视觉任务中具有令人印象深刻的概括能力, 并有适当的文本提示。 最近提议使用任务特定培训数据来学习连续的提示性( COp) 。 尽管下游任务的业绩有所改进, 几项研究报告说, CoOp 在以下两个方面都存在过大的问题:(一) 基础班的测试精度首先提高,然后在培训期间恶化;(二) 新型课程的测试精度不断下降。然而, 现有的研究中没有一项能够理解和减轻这种超标性的问题。 在本研究中,我们首先通过分析梯度流来探索超标的原因。 比较实验显示, Coopp 有利于在早期和以后的培训阶段中, 分别导致不适应和过度现象。 鉴于上述观察, 我们建议 Subspace 加速调试( SubPT) 将不易变异性的方法投射到低级的子空间中, 由早期的梯度流流流/ 推进性升级 。 在整个培训过程中, 不断更新的轨道变动的调调化过程中, 提高我们学习的调调调化能力。