Recent work on large language models relies on the intuition that most natural language processing tasks can be described via natural language instructions. Language models trained on these instructions show strong zero-shot performance on several standard datasets. However, these models even though impressive still perform poorly on a wide range of tasks outside of their respective training and evaluation sets. To address this limitation, we argue that a model should be able to keep extending its knowledge and abilities, without forgetting previous skills. In spite of the limited success of Continual Learning we show that Language Models can be continual learners. We empirically investigate the reason for this success and conclude that Continual Learning emerges from self-supervision pre-training. Our resulting model Continual-T0 (CT0) is able to learn diverse new tasks, while still maintaining good performance on previous tasks, spanning remarkably through 70 datasets in total. Finally, we show that CT0 is able to combine instructions in ways it was never trained for, demonstrating some compositionality.
翻译:最近关于大语言模型的工作依据的直觉是,大多数自然语言处理任务可以通过自然语言指令来描述。根据这些指令培训的语言模型在几个标准数据集上表现出很强的零弹性能。然而,这些模型虽然在培训和评价数据集之外在广泛的任务方面表现仍然不尽如人意,但是在它们各自的培训和评价数据集之外,这些模型仍然表现不佳。为了解决这一限制,我们认为,一个模型应该能够不断扩大其知识和能力,同时不忘以前的技能。尽管持续学习的成功有限,我们表明语言模型可以持续学习。我们通过经验来调查这一成功的原因,并得出结论,持续学习是从自我监督的预训练中产生的。我们产生的模型“连续学习”能够学习各种不同的新任务,同时仍然保持以往任务的良好业绩,总共覆盖70个数据集。最后,我们证明CT0能够以从未受过训练的方式将指示结合起来,显示某种构成性。