Careful prompt design is critical to the use of large language models in zero-shot or few-shot learning. As a consequence, there is a growing interest in automated methods to design optimal prompts. In this work, we propose Test-time Prompt Editing using Reinforcement learning (TEMPERA). In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge, is adaptive to different queries and provides an interpretable prompt for every query. To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a wide set of commonly-used components like instructions, few-shot exemplars, and verbalizers. The proposed method achieves significant gains compared with recent SoTA approaches like prompt tuning, AutoPrompt, and RLPrompt, across a variety of tasks including sentiment analysis, topic classification, natural language inference, and reading comprehension. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
翻译:谨慎的即时设计对于在零点或几点学习中使用大型语言模型至关重要。 因此,人们对设计最佳提示器的自动化方法越来越感兴趣。 在这项工作中,我们提议使用强化学习(TEMPERA)来测试即时编辑。 与先前的快速生成方法相比,TEMPERA可以有效地利用先前的知识,适应不同的查询,并为每个查询提供可解释的快速。 为了实现这一点,我们设计了一个新的行动空间,允许对初始提示进行灵活编辑,涵盖广泛的通用组件,如指示、微小镜头外观和语言。 与近期的 SoTA方法相比,如快速调、自动Prompt和RLPrompt, 取得了显著的收益, 包括情感分析、主题分类、自然语言推断和阅读理解。 我们的方法实现了5.33x, 与传统的微调方法相比,样本效率平均提高了5.33x。