Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP tasks. Despite its recent success and wide adoption, this process is unstable when there are only a small number of training samples available. The brittleness of this process is often reflected by the sensitivity to random seeds. In this paper, we propose to tackle this problem based on the noise stability property of deep nets, which is investigated in recent literature (Arora et al., 2018; Sanyal et al., 2020). Specifically, we introduce a novel and effective regularization method to improve fine-tuning on NLP tasks, referred to as Layer-wise Noise Stability Regularization (LNSR). We extend the theories about adding noise to the input and prove that our method gives a stabler regularization effect. We provide supportive evidence by experimentally confirming that well-performing models show a low sensitivity to noise and fine-tuning with LNSR exhibits clearly higher generalizability and stability. Furthermore, our method also demonstrates advantages over other state-of-the-art algorithms including L2-SP (Li et al., 2018), Mixout (Lee et al., 2020) and SMART (Jiang et al., 2020).
翻译:在本文中,我们提议根据深网的噪音稳定性特性来解决这一问题,最近的文献对此进行了调查(Arora等人,2018年;Sanyal等人,2020年)。具体地说,我们采用了一种新颖和有效的正规化方法,以改进对NLP任务的微调,称为 " 层与层之间的噪音稳定化 " (LNSR)。我们扩展了在投入中增加噪音的理论,并证明我们的方法具有更稳定的规范化效果。我们通过实验性证实,运行良好的模型显示对噪音的敏感度低,并且与LNSR展览的微调明显提高了一般性和稳定性(Arora等人,2018年;Sanyal等人,2020年)。此外,我们的方法还展示了在L2-SP(Li等人,2018年)、Mix等人,2020年(Lix等人,2020年)和ALg等人(Lix等人,2020年)。