Fine-tuning a pre-trained model (such as BERT, ALBERT, RoBERTa, T5, GPT, etc.) has proven to be one of the most promising paradigms in recent NLP research. However, numerous recent works indicate that fine-tuning suffers from the instability problem, i.e., tuning the same model under the same setting results in significantly different performance. Many recent works have proposed different methods to solve this problem, but there is no theoretical understanding of why and how these methods work. In this paper, we propose a novel theoretical stability analysis of fine-tuning that focuses on two commonly used settings, namely, full fine-tuning and head tuning. We define the stability under each setting and prove the corresponding stability bounds. The theoretical bounds explain why and how several existing methods can stabilize the fine-tuning procedure. In addition to being able to explain most of the observed empirical discoveries, our proposed theoretical analysis framework can also help in the design of effective and provable methods. Based on our theory, we propose three novel strategies to stabilize the fine-tuning procedure, namely, Maximal Margin Regularizer (MMR), Multi-Head Loss (MHLoss), and Self Unsupervised Re-Training (SURT). We extensively evaluate our proposed approaches on 11 widely used real-world benchmark datasets, as well as hundreds of synthetic classification datasets. The experiment results show that our proposed methods significantly stabilize the fine-tuning procedure and also corroborate our theoretical analysis.
翻译:然而,许多最近的工作表明,微调受不稳定问题的影响,即在同一背景下对同一模式进行调整,其业绩大不相同。许多最近的工作提出了解决这一问题的不同方法,但对这些方法的原理和如何运作没有理论上的理解。在本文件中,我们提议对微调进行新的理论稳定性分析,重点是两种常用环境,即全面微调和头部调整。我们界定了每个环境的稳定性,并证明相应的稳定性界限。理论界限解释了为什么以及现有若干方法如何稳定微调程序。除了能够解释大多数观察到的经验发现外,我们提议的理论分析框架也有助于设计有效和可验证的方法。根据我们的理论,我们还提出了三项新的战略,以稳定微调程序,即:马克西尔·马吉因(MMMR)、多功能性分析(ML-M)和多功能性能(ML),大规模地展示了我们所拟议的100度数据基准。(MLO-M),大规模地展示了我们提出的“SUR-Sil-Slimal-Regal-IL-IL-ILS-M-M-M-M-GLO-M-M-GLAD-M-M-MSUL-M-M-M-M-M-M-M-M-M-M-MSULAGR-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-IGR-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-I-I-I-I-I-I-I-I-I-I-I-I-I-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-IL-I-IL-I-I-I-I-IL-IL-IL-I-I-I-I-I-I-IL-IL-I-I-