Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with language modeling objectives.
翻译:鉴于在经过培训的大型语言模型的文字内学习取得成功,我们引入了文字内学习蒸馏法,将大模型的文字内学习能力带给小模型,我们提议将文字内学习目标与语言模型目标结合起来,将读文字内实例和任务知识的能力提炼给小模型。我们在两个不同的微小学习模式:Meta Intext Tutinning(Meta-ICT)和Multitask In-contle Tutinning(Multitask-ICT)下进行文内学习蒸馏。多任务-ICT在多任务略学方面表现更好,但也需要比Meta-ICT更多的计算。我们的方法显示Meta-IC和Multitask-IC在LAM和CrossFit这两个基准上不断改进。我们的广泛实验和分析显示,文中学习目标和语言模型目标在Multitask-ICT模式下是相辅相成的。内学习目标在与语言模型目标相结合时取得最佳业绩。