Like people, LLMs do not always generate the best text for a given generation problem on their first try (e.g., summaries, answers, explanations). Just as people then refine their text, we introduce SELF-REFINE, a framework for similarly improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an output using an LLM, then allow the same model to provide multi-aspect feedback for its own output; finally, the same model refines its previously generated output given its own feedback. Unlike earlier work, our iterative refinement framework does not require supervised training data or reinforcement learning, and works with a single LLM. We experiment with 7 diverse tasks, ranging from review rewriting to math reasoning, demonstrating that our approach outperforms direct generation. In all tasks, outputs generated with SELF-REFINE are preferred by humans and by automated metrics over those generated directly with GPT-3.5 and GPT-4, improving on average by absolute 20% across tasks.
翻译:像人类一样,LLM(生成语言模型)在第一次尝试生成给定生成问题的文本(例如摘要、答案、解释)时并不总是生成最佳文本。就像人类一样,我们引入了SELF-REFINE,一种通过迭代反馈和精化类似于人类一样改进LLMs初始输出的框架。其核心思想是使用LLM生成输出,然后让同一模型为其自己的输出提供多方面的反馈;最后,同一模型会根据自己的反馈精化其以前生成的输出。与早期工作不同,我们的迭代精化框架不需要有监督的训练数据或强化学习,并且可以使用单个LLM。我们尝试了7个不同的任务,范围从评论重写到数学推理,证明了我们的方法优于直接生成。在所有任务中,使用SELF-REFINE生成的输出被人类和自动度量认为优于使用GPT-3.5和GPT-4之间的直接生成输出,平均改进了绝对20%。