Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings.
翻译:培训前的模型在加强面向任务的对话系统方面证明是强大的,然而,目前的培训前方法主要侧重于加强对话理解和生成任务,而忽视了对对话政策的利用。在本文件中,我们提议GALAXY,这是一个经过培训的新型对话模式,通过半监督的学习,明确从有限的标签式对话中学习对话政策和大型无标签对话团团。具体地说,我们引入了培训前政策优化的对话行为预测任务,并使用一致性正规化术语,在无标签对话的帮助下改进学习的演示内容。我们还实施了一个格子机制,以权衡适当的无标签对话样本。经验性结果显示,GALAXY大大改进了任务式对话系统的性能,并在基准数据集上取得了新的最新成果:Car、MUWOZ2.0和MUWOZ2.1, 分别将其端对端合并得分提高2.5、5.3和5.5分。我们还表明,GALAXY比在各种低资源环境下的现有模型更有能力。