Handling long texts with structural information and excluding redundancy between summary sentences are essential in extractive document summarization. In this work, we propose GoSum, a novel reinforcement-learning-based extractive model for long-paper summarization. GoSum encodes states by building a heterogeneous graph from different discourse levels for each input document. We evaluate the model on two datasets of scientific articles summarization: PubMed and arXiv where it outperforms all extractive summarization models and most of the strong abstractive baselines.
翻译:在抽取文件总结中,必须用结构信息处理长篇文字,排除摘要判决之间的冗余。在这项工作中,我们提议GoSum,这是一个基于强化学习的长纸总结采掘新模式。GoSum编码通过为每份输入文件从不同层次的论述上绘制一个多式图表来显示。我们评估了两个科学文章摘要化数据集的模型:PubMed和arXiv, 其性能优于所有抽取总结模型和大多数强健的抽象基线。