Citation graphs can be helpful in generating high-quality summaries of scientific papers, where references of a scientific paper and their correlations can provide additional knowledge for contextualising its background and main contributions. Despite the promising contributions of citation graphs, it is still challenging to incorporate them into summarization tasks. This is due to the difficulty of accurately identifying and leveraging relevant content in references for a source paper, as well as capturing their correlations of different intensities. Existing methods either ignore references or utilize only abstracts indiscriminately from them, failing to tackle the challenge mentioned above. To fill that gap, we propose a novel citation-aware scientific paper summarization framework based on citation graphs, able to accurately locate and incorporate the salient contents from references, as well as capture varying relevance between source papers and their references. Specifically, we first build a domain-specific dataset PubMedCite with about 192K biomedical scientific papers and a large citation graph preserving 917K citation relationships between them. It is characterized by preserving the salient contents extracted from full texts of references, and the weighted correlation between the salient contents of references and the source paper. Based on it, we design a self-supervised citation-aware summarization framework (CitationSum) with graph contrastive learning, which boosts the summarization generation by efficiently fusing the salient information in references with source paper contents under the guidance of their correlations. Experimental results show that our model outperforms the state-of-the-art methods, due to efficiently leveraging the information of references and citation correlations.
翻译:引用图有助于产生高质量的科学论文摘要,科学论文及其相关内容的参考文献及其关联性可以提供补充知识,使其背景和主要贡献背景化。尽管引用图作出了很有希望的贡献,但将其纳入汇总任务仍具有挑战性。这是因为很难准确确定和利用源文件参考文献中的相关内容,并捕捉不同强度的关联性。现有的方法要么忽略参考文献,要么只是不加区别地使用其中的参考文献摘要,未能应对上述挑战。为填补这一差距,我们提议以引用图为基础,建立一个新的有启发的科学论文汇总框架,能够准确定位和纳入引用图的突出内容,并捕捉源文件及其参考文献之间的不同关联性。具体地说,我们首先用大约192K的生物医学科学论文和保存它们之间917K号参考文献样本关系的大图,其特征是保留从参考文献全文中提取的突出内容,以及参考文献文献中突出的参考文献目录内容与精度参考文献目录中的突出关联性关联性,能够准确查找源文件的精度,并用图表的精确度来显示其原始数据。