Parameter-efficient tuning (PETuning) methods have been deemed by many as the new paradigm for using pretrained language models (PLMs). By tuning just a fraction amount of parameters comparing to full model finetuning, PETuning methods claim to have achieved performance on par with or even better than finetuning. In this work, we take a step back and re-examine these PETuning methods by conducting the first comprehensive investigation into the training and evaluation of PETuning methods. We found the problematic validation and testing practice in current studies, when accompanied by the instability nature of PETuning methods, has led to unreliable conclusions. When being compared under a truly fair evaluation protocol, PETuning cannot yield consistently competitive performance while finetuning remains to be the best-performing method in medium- and high-resource settings. We delve deeper into the cause of the instability and observed that model size does not explain the phenomenon but training iteration positively correlates with the stability.
翻译:许多人认为,参数效率调试(PeTuning)方法是使用预先培训的语言模型的新范例。通过对与完全模型微调相比的参数数量进行微小的调整,PETuning方法声称在与微调相当或甚至比微调更好的程度上取得了业绩。在这项工作中,我们退一步,通过对PETuning方法的培训和评价进行第一次全面调查,重新审查这些PETunning方法。我们发现,在目前的研究中,有问题的验证和测试做法,加上PETuning方法的不稳定性质,已经得出不可靠的结论。在根据真正公平的评估协议进行比较时,PETuning无法产生一贯的竞争性业绩,而微调仍然是中高资源环境中的最佳方法。我们深入探讨不稳定的原因,发现模型的规模并不能解释这一现象,但培训与稳定性有积极的关系。