This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additionally, UBAR is evaluated in a more realistic setting, where its dialog context has access to user utterances and all content it generated such as belief states, system acts, and system responses. Experimental results on the MultiWOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life. We also examine the transfer ability of UBAR to new domains with limited data and provide visualization and a case study to illustrate the advantages of UBAR in modeling on a dialog session level.
翻译:本文介绍了我们的任务导向对话系统UBAR, 它在对话会中模拟了面向任务的对话。 具体地说, UBAR是通过微调大型预先训练的单向单向语言模型GPT-2获得的,它涉及整个对话会的顺序,整个对话会场的顺序由用户发言、信仰状态、数据库结果、系统行为和每个对话会场的系统反应组成。 此外, UBAR是在更现实的环境中进行评估的,对话会得到用户的讲稿和它产生的所有内容,例如信仰状态、系统行为和系统反应。多WOZ数据集的实验结果表明, UBAR在多个环境中取得了最先进的业绩,提高了反应生成、政策优化和终端到终端模型的组合分数,分别是4.7、3.5和9.4分。 索罗夫分析表明,会议一级的培训序列的拟订和生成的对话环境对于UBAR在现实生活中作为完全端到端面向任务的对话系统运作至关重要。 我们还审查了UBAR在有限的数据上向新领域转移能力,提供了可视化和案例研究的优势。