We propose a simple and general method to regularize the fine-tuning of Transformer-based encoders for text classification tasks. Specifically, during fine-tuning we generate adversarial examples by perturbing the word embeddings of the model and perform contrastive learning on clean and adversarial examples in order to teach the model to learn noise-invariant representations. By training on both clean and adversarial examples along with the additional contrastive objective, we observe consistent improvement over standard fine-tuning on clean examples. On several GLUE benchmark tasks, our fine-tuned BERT Large model outperforms BERT Large baseline by 1.7% on average, and our fine-tuned RoBERTa Large improves over RoBERTa Large baseline by 1.3%. We additionally validate our method in different domains using three intent classification datasets, where our fine-tuned RoBERTa Large outperforms RoBERTa Large baseline by 1-2% on average.
翻译:我们建议一种简单和一般的方法来规范文本分类任务中基于变换器的编码器的微调。 具体地说,在微调过程中,我们通过干扰模型的字嵌入并进行对比性学习,对清洁和敌对实例进行对比性学习,以教授模型来学习噪音变化的表示方式。 通过清洁和对抗性实例的培训以及额外的对比性目标,我们观察到与清洁实例的标准微调相比不断改进。 在一些GLUE基准任务中,我们经过微调的BERT大模型平均比BERT大基线高出1.7%,而我们经过微调的RoBERTA大模型比ROBERTA大基线改进了1.3%。 我们还利用三个意图分类数据集在不同的领域验证了我们的方法,我们经过微调的RoBERTA大模型比ROBERTA大基线平均比1-2 %。