使训练前语言模型更好地培养少见的学习者 (Making Pre-trained Language Models Better Few-shot Learners)

The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

翻译：最近的GPT-3模型(Brown等人,2020年)仅通过利用一种自然语言快速和少数任务演示作为投入背景,取得了惊人的微小成绩。根据这些模型的研究结果,我们研究在更实际的情景下进行的微小语言模型,其中我们使用微调效率的较小语言模型进行微调。我们介绍了语言模型的LM-BFF(Brown等人,2020年)优调语言模型的精细微微微调微调微的一套简单和补充技术,仅以少量附加注释的例子为例。我们的方法包括:(1) 快速微调,并配之以新型管道,使快速生成自动化;(2) 将演示动态和有选择地纳入每种背景的精细化战略。最后,我们提出系统评价,分析低资源模型的一系列任务中的微表现,包括分类和回归。我们的实验表明,我们的方法在这种低资源设置中大大优于标准的微调程序,达到30%的绝对改进,在所有任务中平均为11%。我们的方法对任务资源和域专长的自动化作了最起码的假设,因此形成了一个强有力的任务分析方法。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。