In task-oriented dialogs such as MultiWoZ (Budzianowski et al., 2018), an informative and successful system response needs to include key information such as the phone number of a hotel. Therefore, we hypothesize that by asking the model to focus on generating more key quantities correctly, it can achieve better overall performance. In this paper, we propose a new training algorithm, Keywords Reinforcement Language Modeling (KRLM), that aims to use a fine-grained reward function for each token and a new per-token Reinforcement Learning procedure to help the model learn keywords generation more robustly during inference. Empirical results show that our proposed KRLM training algorithm can achieve state-of-the-art performance on the inform rate, success rate, and combined score in the MultiWoZ benchmark dataset.
翻译:在任务导向式对话中,如多维兹(Budzianowski等人,2018年),一个信息丰富和成功的系统响应需要包括关键信息,例如酒店的电话号码。因此,我们假设,通过要求模型正确侧重于生成更多关键数量,它能够取得更好的整体性能。在本文中,我们提议一种新的培训算法,关键词“加强语言模型”(KRLM),目的是对每件物品使用精细的奖赏功能,新的人均强化学习程序,以帮助模型在推断过程中更有力地学习关键词生成。 经验性结果显示,我们提议的KRLM培训算法可以在信息率、成功率和多WoZ基准数据集中的合并分数方面实现最先进的业绩。