Prompt Tuning, conditioning on task-specific learned prompt vectors, has emerged as a data-efficient and parameter-efficient method for adapting large pretrained vision-language models to multiple downstream tasks. However, existing approaches usually consider learning prompt vectors for each task independently from scratch, thereby failing to exploit the rich shareable knowledge across different vision-language tasks. In this paper, we propose multitask vision-language prompt tuning (MVLPT), which incorporates cross-task knowledge into prompt tuning for vision-language models. Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning. We benchmark the proposed MVLPT using three representative prompt tuning methods, namely text prompt tuning, visual prompt tuning, and the unified vision-language prompt tuning. Results in 20 vision tasks demonstrate that the proposed approach outperforms all single-task baseline prompt tuning methods, setting the new state-of-the-art on the few-shot ELEVATER benchmarks and cross-task generalization benchmarks. To understand where the cross-task knowledge is most effective, we also conduct a large-scale study on task transferability with 20 vision tasks in 400 combinations for each prompt tuning method. It shows that the most performant MVLPT for each prompt tuning method prefers different task combinations and many tasks can benefit each other, depending on their visual similarity and label similarity. Code is available at https://github.com/sIncerass/MVLPT.
翻译:以特定任务所学的快速矢量为条件的快速调控,现已成为一种数据高效和参数高效的方法,使大型预先训练的视觉语言模型适应多个下游任务。然而,现有方法通常考虑为每项任务从零开始独立学习快速矢量,从而无法利用不同视觉语言任务的丰富共享知识。在本文件中,我们提议多任务视觉语言快速调控(MVLPT),将跨任务知识纳入对视觉语言模型的快速调控。具体地说,(一) 我们展示从多个来源任务中学习单一可转移的快速度,以启动每项目标任务的快速度;(二) 我们显示许多目标任务可以相互受益于共享快速矢量,从而可以通过多任务快速调控来共同学习。我们用三种具有代表性的快速调校准方法对拟议 MVLPT(MLPT)进行测试,即文本调控调、视觉快速调控、20项结果显示所有单一任务都优度基线快速调方法,为每项目标任务设定新的州级级调算,并在每个通用任务中显示最快速的全任务基准。