Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.
翻译:GPT-3(Brown等人,2020年)等大型语言模型可以执行任意性任务,而不必在只用几个标签的例子来推动后进行微调。可以将任意性任务重新拟订为自然语言提示,可以要求语言模型完成,间接地在被称为快速学习的范式中执行这项任务。到目前为止,以单一方向语言模型为主的新兴快速速成学习能力已经展示出来。然而,在对隐蔽语言模型等目标解密前培训的双向语言模型中,双向MT5模型为转移学习提供了更强的学习介绍。这促使了双向的双向语言模型,激发了双向双向的双向模式,为转移学习提供了更强的学习演示。这促使双向双向双向的双向模型,如隐蔽语言模型(Xue et al., 2021) 的直径直流和直径直方向模型基本上与现有的加速模式不相容不相容。我们用直径直径直径直径直径直的X5和直径直径直径直径直径20G的SAPML的模型(我们直径直径2021)和直径20G)的直方向模型的直翻译。