愿景语言模型快速提款时的理解和缩小超值 (Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models)

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been recently proposed to learn continuous prompts using taskspecific training data. Despite the performance improvements on downstream tasks, several studies have reported that CoOp suffers from the overfitting issue in two aspects: (i) the test accuracy on base classes first improves and then worsens during training;(ii) the test accuracy on novel classes keeps decreasing. However, none of the existing studies can understand and mitigate such overfitting problems. In this study, we first explore the cause of overfitting by analyzing the gradient flow. Comparative experiments reveal that CoOp favors generalizable and spurious features in the early and later training stages, respectively, leading to the non-overfitting and overfitting phenomena. Given those observations, we propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process and successfully eliminate the overfitting problem. In addition, we equip CoOp with a Novel Feature Learner (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set, needless of image training data. Extensive experiments on 11 classification datasets demonstrate that SubPT+NFL consistently boost the performance of CoOp and outperform the state-of-the-art CoCoOp approach. Experiments on more challenging vision downstream tasks, including open-vocabulary object detection and zero-shot semantic segmentation, also verify the effectiveness of the proposed method. Codes can be found at https://tinyurl.com/mpe64f89.

翻译：诸如 CLIP 等预设的视觉语言模型( VLMS) 显示在下游视觉任务中具有令人印象深刻的概括能力, 并有适当的文本提示。最近提议使用任务特定培训数据来学习连续的提示性( COp) 。尽管下游任务的业绩有所改进, 几项研究报告说, CoOp 在以下两个方面都存在过大的问题:(一) 基础班的测试精度首先提高,然后在培训期间恶化;(二) 新型课程的测试精度不断下降。然而, 现有的研究中没有一项能够理解和减轻这种超标性的问题。在本研究中,我们首先通过分析梯度流来探索超标的原因。比较实验显示, Coopp 有利于在早期和以后的培训阶段中, 分别导致不适应和过度现象。鉴于上述观察, 我们建议 Subspace 加速调试( SubPT) 将不易变异性的方法投射到低级的子空间中, 由早期的梯度流流流/ 推进性升级。在整个培训过程中, 不断更新的轨道变动的调调化过程中, 提高我们学习的调调调化能力。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

ChatGPT大模型全栈技术讲解！霍普金斯最新《NLP：自监督模型》2023课程全面讲解预训练指令学习和RLHF等技术，附讲义

专知会员服务

108+阅读 · 2023年4月8日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日