Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance. In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, keeping the rest of the weights frozen at their initial (pretrained) values. We demonstrate that \emph{subset finetuning} (or SubTuning) often achieves accuracy comparable to full finetuning of the model, and even surpasses the performance of full finetuning when training data is scarce. Therefore, SubTuning allows deploying new tasks at minimal computational cost, while enjoying the benefits of finetuning the entire model. This yields a simple and effective method for multi-task learning, where different tasks do not interfere with one another, and yet share most of the resources at inference time. We demonstrate the efficiency of SubTuning across multiple tasks, using different network architectures and pretraining methods.
翻译:精选模型的微调已成为培训神经网络开展新任务的标准方法,导致快速趋同和业绩的改善。在这项工作中,我们研究一种替代微调方法,即我们不精细调整网络的所有重量,而只训练一个精心选择的分层,将其余的重量冻结在初始(精练)值上。我们证明\emph{次元微调}(或子图灵)往往达到与模型完全微调相近的精确度,甚至超过培训数据稀缺时的全面微调。因此,SubTurning允许以最低计算成本部署新任务,同时享受对整个模型进行微调的好处。这为多任务学习提供了简单有效的方法,而不同的任务并不相互干扰,而是在交错时分享大部分资源。我们用不同的网络架构和预演方法展示了SubTuning跨越多种任务的效率。