Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific pretraining is often not robust. In particular, the performance considerably varies as the random seed changes or the number of pretraining and/or fine-tuning iterations varies, and the fine-tuned model is vulnerable to adversarial attack. We propose a simple yet effective adapter-based approach to mitigate these issues. Specifically, we insert small bottleneck layers (i.e., adapter) within each layer of a pretrained model, then fix the pretrained layers and train the adapter layers on the downstream task data, with (1) task-specific unsupervised pretraining and then (2) task-specific supervised training (e.g., classification, sequence labeling). Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks.
翻译:简单微调这些大型语言模式的下游任务,或将其与具体任务前培训相结合,往往不够健全。特别是,由于随机种子变化或培训前和(或)调整前的迭代数量不同,业绩差异很大,微调模式很容易受到对抗性攻击。我们提出了一种简单而有效的适应型方法来缓解这些问题。具体地说,我们在一个预先培训模式的每一层中插入了小型瓶颈层(即适应器),然后修补了预先培训的层,并在下游任务数据上培训了适应方层,(1) 具体任务前培训没有监督,然后(2) 具体任务监督的培训(如分类、序列标签)。我们的实验表明,这种培训计划可以改善向各种下游任务转移学习的稳定性和对抗性强势。