While parameter efficient tuning (PET) methods have shown great potential with transformer architecture on Natural Language Processing (NLP) tasks, their effectiveness is still under-studied with large-scale ConvNets on Computer Vision (CV) tasks. This paper proposes Conv-Adapter, a PET module designed for ConvNets. Conv-Adapter is light-weight, domain-transferable, and architecture-agnostic with generalized performance on different tasks. When transferring on downstream tasks, Conv-Adapter learns tasks-specific feature modulation to the intermediate representations of backbone while keeping the pre-trained parameters frozen. By introducing only a tiny amount of learnable parameters, e.g., only 3.5% full fine-tuning parameters of ResNet50, Conv-Adapter outperforms previous PET baseline methods and achieves comparable or surpasses the performance of full fine-tuning on 23 classification tasks of various domains. It also presents superior performance on few-shot classifications, with an average margin of 3.39%. Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.
翻译:虽然参数高效调(PET)方法在自然语言处理(NLP)任务的变压器结构方面显示出巨大的潜力,但其有效性仍然没有得到大规模计算机视野(CV)任务ConvNet的深入研究,本文件提议Conv-Adapter,这是为ConvNets设计的PET模块。Conv-Adapter是轻量、可域可转让和建筑创新的,具有对不同任务的普遍业绩。在移交下游任务时,Conv-Adapter在保持事先训练的参数冻结的同时,学习了中间主干结构的具体特点调控。仅引入少量的可学习参数,例如,只有3.5%的ResNet50、Conv-Adapter的完全微调参数,超越了以前的PET基准方法,并且取得了与23项不同领域分类任务的全面调整业绩的可比较或超过。它还展示了微调业绩的优异性,平均幅度为3.39%。除了分类外,Conv-Adapter还可以将一般性差的性差比一般性调50。