培训前、迅速和预测:对自然语言处理的快速方法的系统调查 (Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing)

This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website http://pretrain.nlpedia.ai/ including constantly-updated survey, and paperlist.

翻译：这种纸质调查和组织研究是在自然语言处理的新模式下进行的,我们称之为“即时学习 ” 。与传统监督学习不同,传统监督学习培训一种模式,以输入xx和预测输出y作为P(y ⁇ x), 快速学习以语言模型为基础,直接模拟文本的概率。为了使用这些模型执行预测任务,原始输入x使用一个模板,将模板修改成文本字符串提示x,该模板有一些未填满的空格,然后语言模型用于概率性地填补未填充的信息,以获得最后的字符串x,从中可以产生最终输出y。这个框架既有力又有吸引力,原因很多:它允许语言模型预先训练大量原始文本,并且通过界定一个新的提示性功能,该模型能够进行几发甚至零发的学习,以少量或没有标签的数据适应新的情景。在本文中,我们介绍这种有希望的模式的基础,描述一套统一的数学标准,它可以覆盖现有的各种工作,但不能不断产生最终产出 y。这个框架由于一些原因,因此很有力和有吸引力。它使得语言模型能够对大量原始文本进行预先进行训练, 并组织一个快速的实地研究。