Multi-document summarization (MDS) aims to generate a summary for a number of related documents. We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e.g., words and sentences) of the documents. This contrasts with existing MDS models which do not consider different edge types of graphs and as such do not capture the diversity of relationships in the documents. To preserve only key information and relationships of the documents in the heterogeneous graph, HGSUM uses graph pooling to compress the input graph. And to guide HGSUM to learn compression, we introduce an additional objective that maximizes the similarity between the compressed graph and the graph constructed from the ground-truth summary during training. HGSUM is trained end-to-end with graph similarity and standard cross-entropy objectives. Experimental results over MULTI-NEWS, WCEP-100, and ARXIV show that HGSUM outperforms state-of-the-art MDS models. The code for our model and experiments is available at: https://github.com/oaimli/HGSum.
翻译:多文件总和(MDS) 旨在为若干相关文件生成摘要。 我们提议 HGSUM, 一种扩展编码器- 解码器结构的MDS模型, 以纳入一个多元图, 以代表文档的不同语义单位( 如文字和句子) 。 这与现有的MDS模型形成对比, 这些模型不考虑不同边缘类型的图表, 因而不包含文件中的各种关系。 为了只保存多元图中文档的关键信息和关系, HGSUM 使用图集来压缩输入图。 为了引导 HGSUM 学习压缩, 我们引入了一个额外目标, 以尽量扩大压缩图与从地面图集中构建的图表之间的相似性。 HGSUM经过了培训, 其端端与图形相似性和标准的跨元素目标相近。 MLUS- NEWS、 WCEP-100 和 ARXIV 的实验结果显示 HGSUMM 超越状态- the- art MDS 模型。 我们模型和实验的代码可以在 https:// imum/Himum.</s>