This paper presents our task-oriented dialog system UBAR which models task-oriented dialogs on a dialog session level. Specifically, UBAR is acquired by fine-tuning the large pre-trained unidirectional language model GPT-2 on the sequence of the entire dialog session which is composed of user utterance, belief state, database result, system act, and system response of every dialog turn. Additionally, UBAR is evaluated in a more realistic setting, where its dialog context has access to user utterances and all content it generated such as belief states, system acts, and system responses. Experimental results on the MultiWOZ datasets show that UBAR achieves state-of-the-art performances in multiple settings, improving the combined score of response generation, policy optimization, and end-to-end modeling by 4.7, 3.5, and 9.4 points respectively. % especially in end-to-end modeling, where we improve the combined score by 9.4 points. Thorough analyses demonstrate that the session-level training sequence formulation and the generated dialog context are essential for UBAR to operate as a fully end-to-end task-oriented dialog system in real life. We also examine the transfer ability of UBAR to new domains with limited data and provide visualization and a case study to illustrate the advantages of UBAR in modeling on a dialog session level.
翻译:本文介绍了我们以任务为导向的对话系统 UBAR, 它在对话会中模拟了以任务为导向的对话。 具体地说, UBAR是通过微调大型预先训练的单向单向语言模型GPT-2 获得的, 该模型涉及整个对话会的顺序, 整个对话会的顺序由用户发言、 信仰状态、 数据库结果、 系统动作和每个对话会场的系统反应组成。 此外, UBAR 是在更现实的环境下进行评估的, 其对话环境可以读取用户的言论和它生成的所有内容, 如信仰状态、 系统动作和系统反应。 多WOZ 数据集的实验结果表明, UBAR 在多个环境中取得了最先进的业绩, 改进了反应生成、政策优化和端对端模型的组合得分, 分别是4. 7、3.5和9.4 和9.4 点。 % 特别是端对终端到端的模型, 我们把合并的得分提高9.4分。 索拉夫分析表明, 班级培训序列的设置和生成的对话环境环境对于UBAR作为完全端到任务式模式对话系统的运行, 在现实生活中, 和视觉对话中, 将一个有限的数据库中, 向有一定的样的成绩分析。