To mitigate the lack of diverse dialogue summarization datasets in academia, we present methods to utilize non-dialogue summarization data for enhancing dialogue summarization systems. We apply transformations to document summarization data pairs to create training data that better befit dialogue summarization. The suggested transformations also retain desirable properties of non-dialogue datasets, such as improved faithfulness to the source text. We conduct extensive experiments across both English and Korean to verify our approach. Although absolute gains in ROUGE naturally plateau as more dialogue summarization samples are introduced, utilizing non-dialogue data for training significantly improves summarization performance in zero- and few-shot settings and enhances faithfulness across all training regimes.
翻译:为了减轻学术界缺乏多种对话汇总数据集的情况,我们提出各种方法,利用非对话汇总数据加强对话汇总系统;我们采用转换方法,对文档汇总数据对口进行文档汇总数据配对,以创建更适合对话汇总的培训数据;建议的转换还保留了非对话数据集的可取属性,如对源文本的忠诚程度的提高;我们进行了广泛的英语和韩语实验,以核实我们的做法;虽然随着更多对话汇总样本的引入,罗热亚自然高原地区取得了绝对收益,但我们利用非对话数据进行培训,大大提高了零光和少光环境的汇总性能,加强了所有培训制度的忠实性。