避免在微小镜头即时快速微调中产生推推论 (Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning)

Recent prompt-based approaches allow pretrained language models to achieve strong performances on few-shot finetuning by reformulating downstream tasks as a language modeling problem. In this work, we demonstrate that, despite its advantages on low data regimes, finetuned prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inference heuristics based on lexical overlap, e.g., models incorrectly assuming a sentence pair is of the same meaning because they consist of the same set of words. Interestingly, we find that this particular inference heuristic is significantly less present in the zero-shot evaluation of the prompt-based model, indicating how finetuning can be destructive to useful knowledge learned during the pretraining. We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning. Our evaluation on three datasets demonstrates promising improvements on the three corresponding challenge datasets used to diagnose the inference heuristics.

翻译：最近的速效方法使得经过预先培训的语言模式能够在微调上取得显著的成绩,通过将下游任务重新作为语言模型问题重新拟订来进行微调。在这项工作中,我们证明,尽管在低数据制度方面有优势,但微调基于迅速的判刑组合分类工作模式仍面临一个常见的缺陷,即采用基于词汇重叠的推论超自然理论,例如错误假设一对判刑的模式具有相同的含义,因为它们由一套相同的词组成。有意思的是,我们发现,在对快速模型的零点评评价中,这种特别的推论超量性明显减少,表明微调对在培训前学到的有用知识具有破坏作用。我们然后表明,增加一种保留预先训练权重的规范,对于减轻这种微量微调的破坏性趋势是有效的。我们对三个数据集的评价表明,在用来判断推论的超度的三种对应挑战数据集方面大有改进。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【论文推荐】文本摘要简述

专知会员服务

69+阅读 · 2020年7月20日

【实用书】数据科学基础，484页pdf，Foundations of Data Science

专知会员服务

122+阅读 · 2020年5月28日