Large language models (LMs), while powerful, are not immune to mistakes, but can be difficult to retrain. Our goal is for an LM to continue to improve after deployment, without retraining, using feedback from the user. Our approach pairs an LM with (i) a growing memory of cases where the user identified an output error and provided general feedback on how to correct it (ii) a corrector model, trained to translate this general feedback into specific edits to repair the model output. Given a new, unseen input, our model can then use feedback from similar, past cases to repair output errors that may occur. We instantiate our approach using an existing, fixed model for script generation, that takes a goal (e.g., "bake a cake") and generates a partially ordered sequence of actions to achieve that goal, sometimes containing errors. Our memory-enhanced system, FBNet, learns to apply user feedback to repair such errors (up to 30 points improvement), while making a start at avoiding similar past mistakes on new, unseen examples (up to 7 points improvement in a controlled setting). This is a first step towards strengthening deployed models, potentially broadening their utility. Our code and data is available at https://github.com/allenai/interscript/.
翻译:大型语言模型(LMS)虽然强大,但不能避免错误,但可能难以再培训。 我们的目标是LM(LM)在部署后继续改进,无需再培训,使用用户的反馈。 我们的方法是将LM(LM)与(一) 越来越多的用户发现输出错误并就如何纠正错误提供一般反馈的记忆(二) 校正模型,经过培训将这种一般反馈转化为具体编辑以修复模型输出。 有了新的、 看不见的投入, 我们的模型就可以利用来自类似、 过去案例的反馈来弥补可能出现的产出错误。 我们用现有的固定的编程模型(例如“蛋糕蛋糕”)来回馈我们的方法,并生成部分有序的行动序列,以实现这一目标,有时包含错误。 我们的记忆强化系统(FBNet)学会应用用户反馈来弥补这些错误(最多30个百分点的改进),同时开始避免在新的、看不见的例子(在受控的设置中改进到7个百分点)上类似的错误。 这是加强部署模型(例如)的第一个步骤。