Dialogue summarization has recently garnered significant attention due to its wide range of applications. However, existing methods for summarizing dialogues are suboptimal because they do not take into account the inherent structure of dialogue and rely heavily on labeled data, which can lead to poor performance in new domains. In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain. To pre-train DIONYSUS, we create two pseudo summaries for each dialogue example: one is produced by a fine-tuned summarization model, and the other is a collection of dialogue turns that convey important information. We then choose one of these pseudo summaries based on the difference in information distribution across different types of dialogues. This selected pseudo summary serves as the objective for pre-training DIONYSUS using a self-supervised approach on a large dialogue corpus. Our experiments show that DIONYSUS outperforms existing methods on six datasets, as demonstrated by its ROUGE scores in zero-shot and few-shot settings.
翻译:对话总结最近因其应用范围广泛而引起人们的极大关注。然而,现有的对话总结方法并不理想,因为它们没有考虑到对话的内在结构,而且严重依赖标签数据,这可能导致新领域业绩不佳。在这项工作中,我们提出DIONYSUS(对话总结培训前的动态输入优化),这是在任何新领域总结对话的预先训练的编码器解码器模型。在培训前DIONYSUS中,我们为每个对话例创建了两个假摘要:一个是精确调整的汇总模型,另一个是传递重要信息的对话框集。我们然后根据不同类型对话的信息分布差异选择一个假摘要。这个选定的伪摘要是使用大型对话堆的自我监督方法对DIONYSUS进行预培训的目标。我们的实验显示,DIONYSUS在零点和几张设置的ROUGE分数中显示,六个数据集的现有方法优于现有方法。