In a citation graph, adjacent paper nodes share related scientific terms and topics. The graph thus conveys unique structure information of document-level relatedness that can be utilized in the paper summarization task, for exploring beyond the intra-document information. In this work, we focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. We first propose a Multi-granularity Unsupervised Summarization model (MUS) as a simple and low-cost solution to the task. MUS finetunes a pre-trained encoder model on the citation graph by link prediction tasks. Then, the abstract sentences are extracted from the corresponding paper considering multi-granularity information. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we next propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available. Apart from employing the link prediction as an auxiliary task, GSS introduces a gated sentence encoder and a graph information fusion module to take advantage of the graph information to polish the sentence representation. Experiments on a public benchmark dataset show that MUS and GSS bring substantial improvements over the prior state-of-the-art model.
翻译:在引用图中,相邻的纸张节点共享了相关的科学术语和专题。因此,图表传达了在纸张总结任务中可以使用的关于文件级关联的独特结构信息,用于在文件内信息之外进行探索。在这项工作中,我们侧重于利用引用图图改进科学纸在不同环境中的采掘汇总。我们首先提出多色素不受到监督的总结模型(MUS),作为任务的一个简单和低成本的解决方案。MUS通过连接预测任务,微调一个在引用图上预先训练过的编码器模型。然后,根据多色素信息,从相应的文件中提取抽象句子。初步结果显示,引用图即使在简单且不受监督的框架中也有帮助。为此,我们先提出一个基于图象的未经监督的总结模型(GSS)模型(GSS),以便在获得大规模标签数据时,在任务上取得更准确的结果。除了将链接预测作为辅助任务外,GSS还引入了一个门式的句子,并在相应的文件中提取了一个图表信息,在简单、未经监督的框架中,引用了引用索引图示SBSBS的模型,从而在前的模型上展示了SBRAIS的模型。