The citation graph is essential for generating high-quality summaries of scientific papers, in which references of a scientific paper and their correlations provide extra knowledge for understanding its background and main contributions. Despite the promising role of the citation graph, effectively incorporating it still remains a big challenge, given the difficulty of accurately identifying and leveraging relevant contents in references for a source paper, as well as modelling their correlations of different intensities. Existing methods either ignore or utilize only abstracts indiscriminately from references, failing to tackle the challenge mentioned above. To fill the gap, we propose a novel citation-aware scientific paper summarization framework based on the citation graph, with the ability to accurately locate and incorporate the salient contents from references, as well as capture varying relevance between source papers and their references. Specifically, we first build a domain-specific dataset PubMedCite with about 192K biomedical scientific papers and a large citation graph preserving 917K citation relationships between them. It is characterized by preserving the salient contents extracted from full texts of references, and the weighted correlation between the salient contents of references and the source paper. Based on it, we design a self-supervised citation-aware summarization framework (CitationSum) with graph contrastive learning, which boosts the summarization generation by efficiently fusing the salient information in references with source paper contents under the guidance of their correlations. Experimental results show that our model outperforms the state-of-the-art methods, due to efficiently leveraging the information of references and citation correlations.
翻译:引用图对于编写高质量的科学论文摘要至关重要,其中科学论文及其相关性的参考文献提供了额外的知识,以了解其背景和主要贡献。尽管引用图的作用很有希望,但有效纳入该图仍然是一个巨大的挑战,因为很难准确确定和利用源文件参考文献中的相关内容,以及难以准确利用源文件参考文献中的相关内容,也难以模拟不同强度的相互关系。现有的方法要么忽视,要么只是不加区别地使用参考文献中的精选摘要,未能应对上述挑战。为了填补空白,我们提议根据引用图建立一个新的引用-有识科学论文总结框架,能够准确查找和纳入参考文献中的突出内容,并反映源文件及其参考文献之间的不同关联性。具体地说,我们首先用大约192K生物医学科学论文和大引用图来保护它们之间的917K引文关系,其特征是保留从参考文献全文中提取的突出内容,以及参考文献中突出的参考文献内容与源文件之间的加权关联性关系。基于该图中,我们设计了一个自缩缩缩缩缩的图像框架,我们用图表的缩略图,以缩缩缩图中的数据缩图,以显示其缩缩图的缩图的缩图格式,以缩图,以显示其缩图中的缩图中的缩图。