This paper is a study of fine-tuning of BERT contextual representations, with focus on commonly observed instabilities in few-sample scenarios. We identify several factors that cause this instability: the common use of a non-standard optimization method with biased gradient estimation; the limited applicability of significant parts of the BERT network for down-stream tasks; and the prevalent practice of using a pre-determined, and small number of training iterations. We empirically test the impact of these factors, and identify alternative practices that resolve the commonly observed instability of the process. In light of these observations, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe the impact of these methods diminishes significantly with our modified process.
翻译:本文是对生物、生物、生物和毒素武器领域背景介绍的微调研究,重点是少数典型情景中常见的不稳定性。我们指出造成这种不稳定的几种因素:普遍使用非标准优化方法,有偏差梯度估计;生物、生物、生物和毒素武器领域网络大部分部分对下游任务的适用性有限;使用预先确定和少量培训迭代的普遍做法。我们从经验上检验这些因素的影响,并找出解决经常观察到的进程不稳定的替代做法。根据这些意见,我们最近重新审视了改进与生物、生物和毒素武器领域专家组的少量微调并重新评估其有效性的方法。一般来说,我们观察到这些方法的影响随着我们经过修改的过程而大大减弱。