In recent years, pretrained models revolutionized the paradigm of natural language understanding (NLU), where we append a randomly initialized classification head after the pretrained backbone, e.g. BERT, and finetune the whole model. As the pretrained backbone makes a major contribution to the improvement, we naturally expect a good pretrained classification head can also benefit the training. However, the final-layer output of the backbone, i.e. the input of the classification head, will change greatly during finetuning, making the usual head-only pretraining (LP-FT) ineffective. In this paper, we find that parameter-efficient tuning makes a good classification head, with which we can simply replace the randomly initialized heads for a stable performance gain. Our experiments demonstrate that the classification head jointly pretrained with parameter-efficient tuning consistently improves the performance on 9 tasks in GLUE and SuperGLUE.
翻译:近年来,预先培训的模型使自然语言理解模式(NLU)发生了革命性的变化,在自然语言理解模式(NLU)中,我们附上了经过事先培训的骨干(如BERT)后随机初始分类头,并微调了整个模型。由于事先培训的骨干对改进做出了重大贡献,我们自然期望良好的预先培训分类头也能使培训受益。然而,在微调过程中,骨干的最终输出(即分类头的投入)将发生巨大变化,使得通常的仅限头的训练前(LP-FT)无效。在本文中,我们发现,节能的参数调整使分类头成为一个良好的分类头,我们可以简单地替换随机初始头,以取得稳定的性能收益。我们的实验表明,分类头在配有节能的调整的同时,会不断改进GLUE和SuperGLUE的9项任务的业绩。