Prior work on language models (LMs) shows that training on a large number of diverse tasks improves few-shot learning (FSL) performance on new tasks. We take this to the extreme, automatically extracting 413,299 tasks from internet tables - orders of magnitude more than the next-largest public datasets. Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Processing (NLP) tasks, but not proportionally to dataset scale. In fact, we find that narrow subsets of our dataset sometimes outperform more diverse datasets. For example, finetuning on software documentation from support.google.com raises FSL performance by a mean of +7.5% on 52 downstream tasks, which beats training on 40 human-curated NLP datasets (+6.7%). Finetuning on various narrow datasets leads to similar broad improvements across test tasks, suggesting that the gains are not from domain adaptation but adapting to FSL in general. We do not observe clear patterns between the datasets that lead to FSL gains, leaving open questions about why certain data helps with FSL.
翻译:语言模型(LMS)先前的工作显示,对大量不同任务的培训可以提高新任务的微小学习(FSL)绩效。 我们将此推向极端,从互联网表格中自动提取413,299项任务 -- -- 比下一个最大的公共数据集高出数量级。 由此得出的数据集的微调可以改进FSL在自然语言处理(NLP)任务上的性能,但与数据设定规模不相称。 事实上,我们发现,我们数据集的狭小子集有时比更多样化的数据集表现得更好。 例如,从支持.google.com对软件文档进行微调,使FSL在52个下游任务上以+7.5%的平均值提高FSL的性能,这比对40个人为的NLP数据集的培训(+6.7%)要强。 对各种狭窄的数据集的微调可以使测试任务得到类似的广泛改进,表明收益并非来自领域调整,而是一般地适应FSFSL。 我们没有看到导致FSL成果的清晰模式。