Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model. Recent works also exhibit the possibility of steering a chatbot's output by plugging in an appropriate prompt. Gradient-based methods are often used to perturb the prompts. However, some language models are not even available to the public. In this work, we first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters. Second, to reduce the training effort and enhance the generalizability to the unseen task, we apply multi-task learning to make the model learn to generalize to new tasks better. The experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters. Furthermore, the model demonstrates the strong ability to quickly adapt to an unseen task in fewer steps than the baseline model.
翻译:大量文献表明,速成学习是使用大型预先培训语言模式的有效方法。最近的工作还展示了通过适当快速插插来引导聊天机器人输出的可能性。 以渐进为基础的方法常常用来干扰快速。 但是,有些语言模式甚至无法为公众所使用。 在这项工作中,我们首先探讨了如何结合快速强化学习(RL)来引导模型的生成,而没有获得任何模型参数。 其次,为了减少培训努力,提高对不可见任务的普遍性,我们应用多任务学习,使模型学会如何更好地概括新任务。 实验结果表明,我们提出的方法可以成功地控制好几种最先进的对话模式,而没有获得参数。 此外,模型表明,在比基线模型更少的步骤下快速适应一项不可见的任务的能力很强。