Extremely large pre-trained language models (PTMs) such as GPT-3 are usually released as a service. It allows users to design task-specific prompts to query the PTMs through some black-box APIs. In such a scenario, which we call Language-Model-as-a-Service (LMaaS), the gradients of PTMs are usually unavailable. Can we optimize the task prompts by only accessing the model inference APIs? This paper proposes the black-box tuning framework to optimize the continuous prompt prepended to the input text via derivative-free optimization. Instead of optimizing in the original high-dimensional prompt space, which is intractable for traditional derivative-free optimization, we perform optimization in a randomly generated subspace due to the low intrinsic dimensionality of large PTMs. The experimental results show that the black-box tuning with RoBERTa on a few labeled samples not only significantly outperforms manual prompt and GPT-3's in-context learning, but also surpasses the gradient-based counterparts, i.e., prompt tuning and full model tuning.
翻译:GPT-3等极大型预先培训语言模型(PTMs)通常作为一种服务发布。 它允许用户设计特定任务提示, 通过一些黑盒 API 来询问 PTM 。 在这样的情景中, 我们称之为语言模型( Model- as- a- service ), PTM 的梯度通常不存在。 我们能否通过只访问模型推断 API 来优化任务提示? 本文建议黑盒调控框架, 通过无衍生物优化优化来优化输入文本的连续快速预知。 我们不是在传统的高维快速空间优化, 而是因为大型 PTM 的低内在维度造成的随机生成子空间进行优化 。 实验结果显示, 与 RoBERTa 的黑盒在少数标签样本上的调整不仅大大超越了手法速度, GPT-3 在文字上学习, 也超过了基于梯度的对应方, 即, 快速调整和完全模型调整 。