Recently, neural topic models (NTMs) have been incorporated into pre-trained language models (PLMs), to capture the global semantic information for text summarization. However, in these methods, there remain limitations in the way they capture and integrate the global semantic information. In this paper, we propose a novel model, the graph contrastive topic enhanced language model (GRETEL), that incorporates the graph contrastive topic model with the pre-trained language model, to fully leverage both the global and local contextual semantics for long document extractive summarization. To better capture and incorporate the global semantic information into PLMs, the graph contrastive topic model integrates the hierarchical transformer encoder and the graph contrastive learning to fuse the semantic information from the global document context and the gold summary. To this end, GRETEL encourages the model to efficiently extract salient sentences that are topically related to the gold summary, rather than redundant sentences that cover sub-optimal topics. Experimental results on both general domain and biomedical datasets demonstrate that our proposed method outperforms SOTA methods.
翻译:最近,神经专题模型(NTMs)被纳入了经过培训的语言模型(PLMs),以捕捉全球语义信息,用于文本汇总;然而,在这些方法中,它们捕捉和整合全球语义信息的方法仍然有局限性;在本文件中,我们提出了一个新颖模型,即图形对比专题强化语言模型(GRETEL),将图形对比专题模型与经过培训的语言模型(GRETEL)结合起来,以充分利用全球和地方背景语义模型,用于长期文件的提取和拼凑;为了更好地捕和将全球语义信息纳入PLMs,图形对比专题模型将等级变异变异器编码器和图形对比学习整合到全球文件背景和黄金摘要的语义信息。为此,GRETEL鼓励该模型有效提取与黄金摘要专题相关的突出句,而不是覆盖亚最佳专题的多余句。普通域和生物医学数据集的实验结果表明,我们拟议的方法超出了SOTA方法。