Goal-oriented dialogue systems face a trade-off between fluent language generation and task-specific control. While supervised learning with large language models is capable of producing realistic text, how to steer such responses towards completing a specific task without sacrificing language quality remains an open question. In this work, we formulate goal-oriented dialogue as a partially observed Markov decision process, interpreting the language model as a representation of both the dynamics and the policy. This view allows us to extend techniques from learning-based control, such as task relabeling, to derive a simple and effective method to finetune language models in a goal-aware way, leading to significantly improved task performance. We additionally introduce a number of training strategies that serve to better focus the model on the task at hand. We evaluate our method, Context-Aware Language Models (CALM), on a practical flight-booking task using AirDialogue. Empirically, CALM outperforms the state-of-the-art method by 7% in terms of task success, matching human-level task performance.
翻译:以目标为导向的对话系统在流利语言生成和具体任务控制之间面临着权衡问题。虽然与大语言模型的监督学习能够产生现实的文本,但如何引导这些反应在不牺牲语言质量的情况下完成具体任务仍然是一个尚未解决的问题。在这项工作中,我们把面向目标的对话作为部分观察的马尔科夫决策程序,将语言模式解释为既代表动态又代表政策。这种观点使我们能够扩大基于学习的控制技术,如任务重命名,以获得一种简单有效的方法,以目标清晰的方式微调语言模型,从而大大改进任务绩效。我们还引入了一些培训战略,以更好地将模型重点放在手头的任务上。我们用空格对话评估了我们的方法,即环境软件语言模型,以实际的飞行计票任务。简而言之,CALM在任务成功方面比最新方法高出7%,与人的工作业绩相匹配。