Recent prompt-based approaches allow pretrained language models to achieve strong performances on few-shot finetuning by reformulating downstream tasks as a language modeling problem. In this work, we demonstrate that, despite its advantages on low data regimes, finetuned prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inference heuristics based on lexical overlap, e.g., models incorrectly assuming a sentence pair is of the same meaning because they consist of the same set of words. Interestingly, we find that this particular inference heuristic is significantly less present in the zero-shot evaluation of the prompt-based model, indicating how finetuning can be destructive to useful knowledge learned during the pretraining. We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning. Our evaluation on three datasets demonstrates promising improvements on the three corresponding challenge datasets used to diagnose the inference heuristics.
翻译:最近的速效方法使得经过预先培训的语言模式能够在微调上取得显著的成绩,通过将下游任务重新作为语言模型问题重新拟订来进行微调。 在这项工作中,我们证明,尽管在低数据制度方面有优势,但微调基于迅速的判刑组合分类工作模式仍面临一个常见的缺陷,即采用基于词汇重叠的推论超自然理论,例如错误假设一对判刑的模式具有相同的含义,因为它们由一套相同的词组成。有意思的是,我们发现,在对快速模型的零点评评价中,这种特别的推论超量性明显减少,表明微调对在培训前学到的有用知识具有破坏作用。我们然后表明,增加一种保留预先训练权重的规范,对于减轻这种微量微调的破坏性趋势是有效的。我们对三个数据集的评价表明,在用来判断推论的超度的三种对应挑战数据集方面大有改进。