Multi-action dialog policy (MADP), which generates multiple atomic dialog actions per turn, has been widely applied in task-oriented dialog systems to provide expressive and efficient system responses. Existing MADP models usually imitate action combinations from the labeled multi-action dialog samples. Due to data limitations, they generalize poorly toward unseen dialog flows. While interactive learning and reinforcement learning algorithms can be applied to incorporate external data sources of real users and user simulators, they take significant manual effort to build and suffer from instability. To address these issues, we propose Planning Enhanced Dialog Policy (PEDP), a novel multi-task learning framework that learns single-action dialog dynamics to enhance multi-action prediction. Our PEDP method employs model-based planning for conceiving what to express before deciding the current response through simulating single-action dialogs. Experimental results on the MultiWOZ dataset demonstrate that our fully supervised learning-based method achieves a solid task success rate of 90.6%, improving 3% compared to the state-of-the-art methods.
翻译:多行动对话政策(MADP)每转产生多个原子对话动作,已广泛应用于任务导向对话系统,以提供清晰高效的系统响应。现有的MADP模型通常模仿标签的多行动对话样板中的动作组合。由于数据限制,它们向看不见的对话框流的概括性差。互动学习和强化学习算法可以用于纳入真实用户和用户模拟器的外部数据源,但是它们需要大量手工努力才能建立并遭受不稳定。为了解决这些问题,我们提议规划增强对话政策(PEDP),这是一个新的多任务学习框架,学习单一行动对话动态,以加强多行动预测。我们的PEDP方法采用基于模型的规划,在通过模拟单一行动对话决定当前反应之前,先观察表达什么。多WOZ数据集的实验结果表明,我们充分监督的学习方法取得了90.6%的可靠任务成功率,比最先进的方法提高了3%。