Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse. We release our code at https://github.com/ZhangHaojie077/DPS
翻译:最近,在一系列广泛的下游任务上,经过培训的大规模语言模式在经过培训的大规模语言模式中取得了令人印象深刻的成果,然而,对关于有限目标数据集的经过培训的极为大规模语言模式进行微调,往往受到过度装配和表述退化的困扰。在本文件中,我们提议在进行微调时,对经过培训的大规模模式采用动态参数选择算法(DPS),该算法可适应性地选择一个更有希望的子网络,以进行基于背面偏差的中继更新。关于GLUE基准的实验表明,DPS在总体性能和稳定性方面比以往的微调方法要好,并始终在经过培训的可变语言模型方面取得更好的结果。此外,DPS还带来了在外向转移实验和低资源情景方面的重大改进,这表明它能够保持稳定的一般背景特征并减少代表性的崩溃。我们在https://github.com/ZhangHao077/DPS上公布了我们的代码。