Pre-trained language models (PLMs) have achieved remarkable success in NLP tasks. Despite the great success, mainstream solutions largely follow the pre-training then finetuning paradigm, which brings in both high deployment costs and low training efficiency. Nevertheless, fine-tuning on a specific task is essential because PLMs are only pre-trained with language signal from large raw data. In this paper, we propose a novel fine-tuning-free strategy for language models, to consider both language signal and teacher signal. Teacher signal is an abstraction of a battery of downstream tasks, provided in a unified proposition format. Trained with both language and strong task-aware teacher signals in an interactive manner, our FreeLM model demonstrates strong generalization and robustness. FreeLM outperforms large models e.g., GPT-3 and InstructGPT, on a range of language understanding tasks in experiments. FreeLM is much smaller with 0.3B parameters, compared to 175B in these models.
翻译:自由语言模型:无微调预训练模型
预训练语言模型在NLP任务中取得了显着的成功。然而,主流的解决方案在很大程度上遵循预训练然后微调的范式,这既带来了高部署成本,也降低了训练效率。尽管如此,在特定任务上微调是必不可少的,因为PLMs只是用来自大型原始数据的语言信号进行预训练的。在本文中,我们提出了一种新颖的无微调预训练语言模型策略,旨在考虑语言信号和示范信号。示范信号是下游任务的抽象,以统一的命题格式提供。我们的FreeLM模型进行了语言和强任务感知的示范信号交互式训练,表现出强大的泛化性和鲁棒性。在实验中,FreeLM在一系列语言理解任务上均优于大型模型(如GPT-3和InstructGPT)。相比于这些模型的175B参数,FreeLM小得多,仅有0.3B参数。