Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data. However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations. Moreover, the data used for pretraining foundation models are usually invisible and very different from the target data of downstream tasks. This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task that has a quite different architecture with only downstream target data. Existing transfer learning or knowledge distillation methods depend on either the same model structure or finetuning of the foundation model. Thus, naively introducing these methods can be either infeasible or very inefficient. To address this, we propose a Task-Driven Model Reprogramming (TDMR) framework. Specifically, we reprogram the foundation model to project the knowledge into a proxy space, which alleviates the adverse effect of task mismatch and domain inconsistency. Then, we reprogram the target model via progressive distillation from the proxy space to efficiently learn the knowledge from the reprogrammed foundation model. TDMR is compatible with different pre-trained model types (CNN, transformer or their mix) and limited target data, and promotes the wide applications of vision foundation models to downstream tasks in a cost-effective manner. Extensive experiments on different downstream classification tasks and target model structures demonstrate the effectiveness of our methods with both CNNs and transformer foundation models.
翻译:视觉基础模型具有令人印象深刻的能力,受益于极大的模型容量和广泛的训练数据。然而,在实际应用中,下游场景可能仅支持小模型,由于有限的计算资源或效率考虑。此外,用于基础模型预训练的数据通常是看不见的,并且与下游任务的目标数据非常不同。这给基础模型的实际应用带来了关键的挑战:必须将基础模型的知识转移给具有非常不同体系结构且仅具有下游目标数据的下游任务。现有的迁移学习或知识蒸馏方法依赖于相同的模型结构或基础模型的微调。因此,简单地引入这些方法可能是不可行的或非常低效的。为了解决这个问题,我们提出了一种基于任务驱动的模型重编程(TDMR)框架。具体地,我们重新编程基础模型将知识投射到代理空间中,从而减轻任务不匹配和域不一致的不良影响。然后,我们通过从代理空间到下游任务的逐步蒸馏重新编程目标模型,以高效地学习来自重新编程基础模型的知识。TDMR与不同类型的预训练模型(CNN,Transformer或两者混合)和有限的下游目标数据兼容,以成本效益的方式推广视觉基础模型到下游任务中。在不同下游分类任务和目标模型结构上进行的大量实验证明了我们使用CNN和Transformer基础模型的方法的有效性。