With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements. Code: \url{https://github.com/changdaeoh/BlackVIP}
翻译:随着大规模预训练模型(PTM)的兴起,将这些模型微调到众多下游任务变成了一个关键问题。因此,参数有效的迁移学习(PETL)方法受到了广泛的关注。虽然最近的PETL方法展示了令人印象深刻的性能,但它们依赖于乐观的假设:1)PTM的整个参数集可用,2)具备足够大的内存容量进行微调。然而,在大多数实际应用中,PTMs作为黑盒API或专有软件提供,没有显式参数可供访问。此外,要满足现代PTMs的大内存需求也很困难。本文提出了黑盒视觉提示(BlackVIP),它可以在不了解模型架构和参数的情况下高效地适应PTMs。BlackVIP由两个组件组成:1)协调器和2)具有梯度修正的同时扰动随机逼近(SPSA-GC)。协调器设计了依赖于输入的图像形状的视觉提示,提高了少量样本适应能力和分布/位置移动的鲁棒性。SPSA-GC有效地估计目标模型的梯度以更新协调器。对16个数据集的广泛实验表明,BlackVIP可以在没有访问PTMs参数的情况下使模型对不同领域具有鲁棒适应性,并且仅需要最少的内存。代码:\url{https://github.com/changdaeoh/BlackVIP}