Task-oriented dialog systems empower users to accomplish their goals by facilitating intuitive and expressive natural language interactions. State-of-the-art approaches in task-oriented dialog systems formulate the problem as a conditional sequence generation task and fine-tune pre-trained causal language models in the supervised setting. This requires labeled training data for each new domain or task, and acquiring such data is prohibitively laborious and expensive, thus making it a bottleneck for scaling systems to a wide range of domains. To overcome this challenge, we introduce a novel Zero-Shot generalizable end-to-end Task-oriented Dialog system, ZS-ToD, that leverages domain schemas to allow for robust generalization to unseen domains and exploits effective summarization of the dialog history. We employ GPT-2 as a backbone model and introduce a two-step training process where the goal of the first step is to learn the general structure of the dialog data and the second step optimizes the response generation as well as intermediate outputs, such as dialog state and system actions. As opposed to state-of-the-art systems that are trained to fulfill certain intents in the given domains and memorize task-specific conversational patterns, ZS-ToD learns generic task-completion skills by comprehending domain semantics via domain schemas and generalizing to unseen domains seamlessly. We conduct an extensive experimental evaluation on SGD and SGD-X datasets that span up to 20 unique domains and ZS-ToD outperforms state-of-the-art systems on key metrics, with an improvement of +17% on joint goal accuracy and +5 on inform. Additionally, we present a detailed ablation study to demonstrate the effectiveness of the proposed components and training mechanism
翻译:任务导向的对话系统通过促进自然语言交互来帮助用户实现其目标。当前在任务导向对话系统领域,先进的方法将问题表述为条件序列生成任务,并在监督设置中微调预训练的因果语言模型。这需要为每个新领域或任务提供标记的训练数据,而获取这些数据过于费时费力,因此成为系统扩展到广泛领域的瓶颈。为了克服这一挑战,我们引入了一种新颖的零-shot通用可迁移的端到端任务导向对话系统,即ZS-ToD,利用领域模式实现对未见领域的坚韧通用性,并利用对话历史的有效概括。我们使用GPT-2作为骨干模型,并引入两阶段的训练过程,第一阶段的目标是学习对话数据的一般结构,第二阶段则优化响应生成以及中间输出,如对话状态和系统动作。与当前的一些最先进的系统不同,这些系统通过训练来满足特定领域和任务的意图,并记忆任务特定的对话模式,ZS-ToD通过理解领域语义实现学习通用任务完成技能,从而实现无缝的通用性扩展到未见领域。我们对SGD和SGD-X数据集进行了广泛的实验评估,这些数据集涵盖了多达20个唯一的领域,并且ZS-ToD在关键指标上优于当前最先进的系统,联合目标准确性提高了17%,信息提供量提高了5。此外,我们进行了详细的消融研究,以证明所提出的组件和训练机制的有效性。