The task of learning from only a few examples (called a few-shot setting) is of key importance and relevance to a real-world setting. For question answering (QA), the current state-of-the-art pre-trained models typically need fine-tuning on tens of thousands of examples to obtain good results. Their performance degrades significantly in a few-shot setting (< 100 examples). To address this, we propose a simple fine-tuning framework that leverages pre-trained text-to-text models and is directly aligned with their pre-training framework. Specifically, we construct the input as a concatenation of the question, a mask token representing the answer span and a context. Given this input, the model is fine-tuned using the same objective as that of its pre-training objective. Through experimental studies on various few-shot configurations, we show that this formulation leads to significant gains on multiple QA benchmarks (an absolute gain of 34.2 F1 points on average when there are only 16 training examples). The gains extend further when used with larger models (Eg:- 72.3 F1 on SQuAD using BART-large with only 32 examples) and translate well to a multilingual setting . On the multilingual TydiQA benchmark, our model outperforms the XLM-Roberta-large by an absolute margin of upto 40 F1 points and an average of 33 F1 points in a few-shot setting (<= 64 training examples). We conduct detailed ablation studies to analyze factors contributing to these gains.
翻译:仅从几个例子中学习(称为“少拍设定”)的任务对于现实世界背景具有关键的重要性和相关性。对于回答问题(QA)来说,目前最先进的预先培训模式通常需要微调数万个实例,才能取得良好结果。它们的表现在几个例子中显著下降( < 100个例子)。为了解决这个问题,我们提议一个简单的微调框架,利用经过预先培训的文本到文本模型,并与培训前框架直接一致。具体地说,我们将这些投入作为问题的一个解说,一个代表答案和背景的掩码符号。鉴于这一投入,模型通常需要对其培训前目标中的数万个实例进行微调。通过对几个例子的实验研究,我们表明,这种提法使多种质量基准(在只有16个培训实例的情况下,平均获得34.2 F1点的绝对增益)。当使用较大的模型(例如:SQAD的72.3 F1 显示答案,代表答案的横跨段和背景背景。鉴于这种投入,只有32个例子,模型将模型加以微调,然后将一个多语言的F1 标准转换成一个BAR1 。