Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited. However, different PELT methods may perform rather differently on the same task, making it nontrivial to select the most appropriate method for a specific task, especially considering the fast-growing number of new PELT methods and tasks. In light of model diversity and the difficulty of model selection, we propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup via gating mechanism. On the GLUE benchmark, UniPELT consistently achieves 1~4% gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups. Moreover, UniPELT generally surpasses the upper bound that takes the best performance of all its submodules used individually on each task, indicating that a mixture of multiple PELT methods may be inherently more effective than single methods.
翻译:最近具有参数效率的语言模型调制方法(PELT)与微调的性能相匹配,培训参数少得多,培训数据有限时效果特别好。然而,不同的PELT方法在相同任务上的表现可能不同,使得为具体任务选择最适当的方法非三重性,特别是考虑到新的PELT方法和任务的数量迅速增长,特别是考虑到新的PELT方法和任务的数量迅速增长,鉴于模型的多样性和选择模式的困难,我们建议一个统一的框架UNIPELT,它将不同的PELT方法作为子模块,并学习如何激活最适合当前数据或任务设置的参数。在GLUE基准方面,UPELT始终能取得1~4%的收益,而与它吸收、甚至超越不同设置下的最佳单个的PELT方法相比。此外,UPELT通常超过其每个任务使用的所有子模块的最佳性能的上限,表明多种PELT方法的混合物在本质上比单一方法更有效。