In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
翻译:在这项工作中,我们探索了“即时调整”这一简单而有效的机制,学习“软提示”,使“软提示”使冻结的语言模型符合执行具体的下游任务的条件。与GPT-3使用的离散文本提示不同,软提示是通过反向插播学习的,可以调整以纳入来自任何贴标签例子的信号。我们的端到端学方法比GPT-3的“few-shot”“few-shot”学习效果大得多。更显著的是,通过用T5对模型尺寸的推算,我们显示,迅速调试变得与规模相比更具竞争力:随着模型超过数十亿参数,我们的方法“缩小差距”与模型调整的强效性(所有模型重量都经过调整)相匹配。最后,这一发现特别相关的是,大型模型的共享和服务成本很高,而且再利用一个冻结模型进行多个下游任务的能力可以减轻这一负担。我们的方法可以被视为简化了最近提议的里和梁(2021年)“前置调”的“前置调”方法,我们提供了与这个和其他类似方法的比较。我们展示了这个和类似的方法。我们展示了将一个稳性模型,以便将模型与软域转变为完全转换为软转。我们展示了软转。最后的模型。我们展示了软转。