We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the recent approach from prompt tuning and adversarial reprogramming, we learn a single image perturbation such that a frozen model prompted with this perturbation performs a new task. Through comprehensive experiments, we demonstrate that visual prompting is particularly effective for CLIP and robust to distribution shift, achieving performance competitive with standard linear probes. We further analyze properties of the downstream dataset, prompt design, and output transformation in regard to adaptation performance. The surprising effectiveness of visual prompting provides a new perspective on adapting pre-trained models in vision. Code is available at http://hjbahng.github.io/visual_prompting .
翻译:我们调查视觉促动在视觉中适应大型模型的功效。根据最近的快速调试和对抗性重新编程方法,我们学会了单一图像扰动,这样一来,由这种扰动引发的冷冻模型就能执行新的任务。通过全面实验,我们证明视觉促动对CLIP特别有效,并且对分布转移非常有力,在标准线性探测器上实现性能竞争力。我们进一步分析下游数据集的特性、迅速设计和适应性能的产出转换。视觉促动的惊人效果为调整经过训练的视觉模型提供了新的视角。代码可在http://hjbahng.github.io/visual_prompting上查阅。