Dialogue summarization helps readers capture salient information from long conversations in meetings, interviews, and TV series. However, real-world dialogues pose a great challenge to current summarization models, as the dialogue length typically exceeds the input limits imposed by recent transformer-based pre-trained models, and the interactive nature of dialogues makes relevant information more context-dependent and sparsely distributed than news articles. In this work, we perform a comprehensive study on long dialogue summarization by investigating three strategies to deal with the lengthy input problem and locate relevant information: (1) extended transformer models such as Longformer, (2) retrieve-then-summarize pipeline models with several dialogue utterance retrieval methods, and (3) hierarchical dialogue encoding models such as HMNet. Our experimental results on three long dialogue datasets (QMSum, MediaSum, SummScreen) show that the retrieve-then-summarize pipeline models yield the best performance. We also demonstrate that the summary quality can be further improved with a stronger retrieval model and pretraining on proper external summarization datasets.
翻译:对话总结有助于读者从会议、访谈和电视系列的长篇对话中获取突出信息。然而,真实世界对话对当前的总结模式构成巨大挑战,因为对话长度通常超过最近基于变压器的预培训模式规定的输入限制,对话的互动性质使得相关信息比新闻文章更符合背景和分散。在这项工作中,我们通过调查三项战略,调查长期输入问题,并找到相关信息,对长期对话总结进行全面研究:(1) 扩展变压器模型,如Longfrew,(2) 检索时合成管道模型,采用几种对话发音检索方法,(3) 等级对话编码模型,如HMNet。我们在三个长期对话数据集(QMSum、MediaSum、SumSumScreen)上的实验结果显示,检索时合成管道模型产生最佳性能。我们还表明,通过更强大的检索模型和对适当的外部总结数据集进行预先培训,可以进一步提高摘要质量。