Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various vision tasks by updating or injecting a small number of parameters instead of full fine-tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computationally friendly adapter for giant vision models, called RepAdapter. Specifically, we prove that the adaption modules, even with a complex structure, can be seamlessly integrated into most giant vision models via structural re-parameterization. This property makes RepAdapter zero-cost during inference. In addition to computation efficiency, RepAdapter is more effective and lightweight than existing PETL methods due to its sparse structure and our careful deployment. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, by updating only 0.6% parameters, we can improve the performance of ViT from 38.8 to 55.1 on Sun397. Its generalizability is also well validated by a bunch of vision models, i.e., ViT, CLIP, Swin-Transformer and ConvNeXt. Our source code is released at https://github.com/luogen1996/RepAdapter.
翻译:我们发现,大多数现有的PETL方法在推断过程中仍然具有不可忽略的延迟性。我们在此文件中提议为巨型视觉模型建立一个节能和计算友好的适应器,称为RepAdapter。具体地说,我们证明,即使结构复杂,适应模块,也可以通过结构再校准在节省各种视觉任务储存成本方面取得巨大成功。这种属性使得RepAdapter在推算过程中实现了零成本。然而,我们注意到,大多数现有的PETL方法在推算过程中仍然比现有的PETL方法更有效、更轻。我们建议为巨型视觉模型提供一个节能和计算友好的适应器,称为RepadAdapter。我们通过三个基准数据集进行广泛的实验,即图像和视频分类以及语义分析,可以通过结构再校准来将SADAA实验结果进行55Adapter零费用。除了计算效率之外,RepainteAreadter还显示SEAT的高级性性能和性能。