Recent development of large-scale pre-trained language models (PLM) have significantly improved the capability of models in various NLP tasks, in terms of performance after task-specific fine-tuning and zero-shot / few-shot learning. However, many of such models come with a dauntingly huge size that few institutions can afford to pre-train, fine-tune or even deploy, while moderate-sized models usually lack strong generalized few-shot learning capabilities. In this paper, we first elaborate the current obstacles of using PLM models in terms of the Impossible Triangle: 1) moderate model size, 2) state-of-the-art few-shot learning capability, and 3) state-of-the-art fine-tuning capability. We argue that all existing PLM models lack one or more properties from the Impossible Triangle. To remedy these missing properties of PLMs, various techniques have been proposed, such as knowledge distillation, data augmentation and prompt learning, which inevitably brings additional work to the application of PLMs in real scenarios. We then offer insights into future research directions of PLMs to achieve the Impossible Triangle, and break down the task into several key phases.
翻译:最近开发的大规模预先培训语言模式(PLM)大大提高了各种NLP任务模式的能力,在具体任务的微调和零点/微小的学习后,大大提高了模型在各种NLP任务中的业绩能力,然而,许多这样的模式都具有巨大规模,很少有机构能够负担得起预培训、微调甚至部署,而中等规模的模式通常缺乏强大的普遍、少见的学习能力。在本文件中,我们首先阐述了在不可能的三角方面使用PLM模式的现有障碍:(1) 中等规模的模型,(2) 最先进的微小的学习能力,(3) 最先进的微调能力。我们说,所有现有的PLM模型都缺乏不可能的三角洲的一个或多个特性。为了补救PLMS的这些缺失特性,提出了各种技术,例如知识蒸馏、数据增强和迅速学习,这不可避免地给在现实情景中应用PLMS带来更多的工作。我们然后对PLMS的未来研究方向提出见解,以便实现不可能的三角洲,并将任务分成几个关键阶段。