Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training. Towards this goal, this paper introduces ExMix (Extreme Mixture): a massive collection of 107 supervised NLP tasks across diverse domains and task-families. Using ExMix, we study the effect of multi-task pre-training at the largest scale to date, and analyze co-training transfer amongst common families of tasks. Through this analysis, we show that manually curating an ideal set of tasks for multi-task pre-training is not straightforward, and that multi-task scaling can vastly improve models on its own. Finally, we propose ExT5: a model pre-trained using a multi-task objective of self-supervised span denoising and supervised ExMix. Via extensive experiments, we show that ExT5 outperforms strong T5 baselines on SuperGLUE, GEM, Rainbow, Closed-Book QA tasks, and several tasks outside of ExMix. ExT5 also significantly improves sample efficiency while pre-training.
翻译:尽管最近多任务学习和自然语言处理转让学习取得了成功,但很少有工作系统地研究了在培训前增加任务数量的效果。为了实现这一目标,本文件介绍了ExMix(Exmix Mixture):大量收集了107项不同领域和任务家庭的监督NLP任务。我们利用ExMix,研究迄今规模最大的多任务前培训的影响,并分析共同任务家庭之间的共同培训转移。我们通过这一分析,发现人工制定一套多任务培训前任务的理想任务并非直截了当,多任务规模的扩大可以大大改进自己的模式。最后,我们提出EXT5:一个使用自我监督跨段拆卸和监督ExMix的多任务目标预先培训的模型。我们通过广泛的实验显示,ExT5超越了SuperGLUE、GEM、彩虹、封闭Book QA的任务的强大T5基线,在ExMix 样本外也大幅改进了ExT5前的工作效率。