In task-oriented dialogs such as MultiWoZ (Budzianowski et al., 2018), an informative and/or successful system response needs to include necessary key information such as the phone number of a hotel. Therefore, we hypothesize that by helping the model to focus more on learning key quantities in the dialog, the model can generative more informative and helpful responses. In this paper, we propose a new training algorithm, Reinforced Language Modeling (RLM), that aims to use a fine-grained reward function and reinforcement learning to help the model focus more on generating key quantities correctly during test time. Empirical results show our proposed RLM achieves state-of-the-art performance on the inform rate, success rate, and combined score in MultiWoZ.
翻译:在多任务对话中,例如多任务对话(Budzianowski等人,2018年),一个信息丰富和/或成功的系统反应需要包括必要的关键信息,例如旅馆的电话号码。因此,我们假设,通过帮助模型更多地侧重于在对话中学习关键数量,模型可以产生更丰富和更有帮助的反应。在本文中,我们提出了一个新的培训算法“强化语言建模(RLM) ”, 目的是使用精细的奖赏功能和加强学习,以帮助模型更加侧重于在测试期间正确生成关键数量。 经验性结果显示,我们拟议的RLM在多沃兹的信息率、成功率和综合得分方面实现了最新业绩。