Extremely large pre-trained language models (PTMs) such as GPT-3 are usually released as a service. It allows users to design task-specific prompts to query the PTMs through some black-box APIs. In such a scenario, which we call Language-Model-as-a-Service (LMaaS), the gradients of PTMs are usually unavailable. Can we optimize the task prompts by only accessing the model inference APIs? This paper proposes the black-box tuning framework to optimize the continuous prompt prepended to the input text via derivative-free optimization. Instead of optimizing in the original high-dimensional prompt space, which is intractable for traditional derivative-free optimization, we perform optimization in a randomly generated subspace due to the low intrinsic dimensionality of large PTMs. The experimental results show that the black-box tuning with RoBERTa on a few labeled samples not only significantly outperforms manual prompt and GPT-3's in-context learning, but also surpasses the gradient-based counterparts, i.e. prompt tuning and full model tuning.
翻译:GPT-3等极大型预先培训语言模型(PTMs)通常作为一种服务发布。 它允许用户设计特定任务提示, 通过一些黑盒 API 来询问 PTM 。 在这种我们称之为语言模式服务( LMaaS ) 的情景下, PTM 的梯度通常不存在。 我们能否通过只访问模型推断API来优化任务提示? 本文建议黑盒调控框架, 通过无衍生物优化优化来优化输入文本的连续快速预知。 我们不是在传统的高维快速空间优化,而是由于大型 PTM 的低内在维度而随机生成的子空间进行优化。 实验结果显示, 与 ROBERTA 的黑盒对少数标签样本的调整不仅大大超越了手法, GPT-3 在文字上学习, 也超过了基于梯度的对应方, 即快速调整和完全模型调整 。