Meeting summarization is a challenging task due to its dynamic interaction nature among multiple speakers and lack of sufficient training data. Existing methods view the meeting as a linear sequence of utterances while ignoring the diverse relations between each utterance. Besides, the limited labeled data further hinders the ability of data-hungry neural models. In this paper, we try to mitigate the above challenges by introducing dialogue-discourse relations. First, we present a Dialogue Discourse-Dware Meeting Summarizer (DDAMS) to explicitly model the interaction between utterances in a meeting by modeling different discourse relations. The core module is a relational graph encoder, where the utterances and discourse relations are modeled in a graph interaction manner. Moreover, we devise a Dialogue Discourse-Aware Data Augmentation (DDADA) strategy to construct a pseudo-summarization corpus from existing input meetings, which is 20 times larger than the original dataset and can be used to pretrain DDAMS. Experimental results on AMI and ICSI meeting datasets show that our full system can achieve SOTA performance. Our codes will be available at: https://github.com/xcfcode/DDAMS.
翻译:由于多位发言者之间的动态互动和缺乏足够的培训数据,会议总结是一项具有挑战性的任务。现有方法将会议视为一个直线的表达顺序,而忽略了每个发言之间的不同关系。此外,有限的标签数据还进一步妨碍了数据饥饿神经模型的能力。在本文件中,我们试图通过引入对话-分流关系来缓解上述挑战。首先,我们提出一个对话分会-软件会议总结器(DDAMS),通过模拟不同的讨论关系来明确模拟会议发言之间的互动。核心模块是一个关系图解编码器,其表达和讨论关系以图表互动方式建模。此外,我们设计了一个对话分会-软件数据增强(DDADAD)战略,以从现有的投入会议中构建一个假的合成集,它比原始的数据集大20倍,可以用于前台DDAMMS。AMI和ICSI会议数据集的实验结果显示,我们的整个系统可以实现SOTA的性能。我们的代码将可以在 https://GADADA/ compcf.