Previous dialogue summarization datasets mainly focus on open-domain chitchat dialogues, while summarization datasets for the broadly used task-oriented dialogue haven't been explored yet. Automatically summarizing such task-oriented dialogues can help a business collect and review needs to improve the service. Besides, previous datasets pay more attention to generate good summaries with higher ROUGE scores, but they hardly understand the structured information of dialogues and ignore the factuality of summaries. In this paper, we introduce a large-scale public Task-Oriented Dialogue Summarization dataset, TODSum, which aims to summarize the key points of the agent completing certain tasks with the user. Compared to existing work, TODSum suffers from severe scattered information issues and requires strict factual consistency, which makes it hard to directly apply recent dialogue summarization models. Therefore, we introduce additional dialogue state knowledge for TODSum to enhance the faithfulness of generated summaries. We hope a better understanding of conversational content helps summarization models generate concise and coherent summaries. Meanwhile, we establish a comprehensive benchmark for TODSum and propose a state-aware structured dialogue summarization model to integrate dialogue state information and dialogue history. Exhaustive experiments and qualitative analysis prove the effectiveness of dialogue structure guidance. Finally, we discuss the current issues of TODSum and potential development directions for future work.
翻译:前一次对话总和数据集主要侧重于开放的面部切开对话,而广泛使用的任务导向对话的总和数据集则尚未探讨。自动总结这种任务导向对话有助于企业收集和审查改进服务的需求。此外,以前的数据集更加关注生成良好的摘要,高ROUGE分数,但几乎不理解对话的结构化信息,忽视摘要的真实性。在本文件中,我们引入了大规模公共任务导向对话的总和数据集,TODSum,目的是总结完成与用户某些任务的代理人的要点。与现有工作相比,TODSum面临严重分散的信息问题,需要严格的事实一致性,难以直接应用最近的对话总和模型。因此,我们为TODSum引入了更多的对话状态知识,以提高生成摘要的准确性。我们希望对对话内容的更好理解有助于总结模型产生简明和连贯的总结。与此同时,我们为TODSum制定了一个全面基准,并提出了当前结构化对话结构化对话总和历史分析的州级结构化模型。我们为整合当前对话总和历史分析的定性分析,我们为对话总效性对话的模型。