In the scenario of unsupervised extractive summarization, learning high-quality sentence representations is essential to select salient sentences from the input document. Previous studies focus more on employing statistical approaches or pre-trained language models (PLMs) to extract sentence embeddings, while ignoring the rich information inherent in the heterogeneous types of interaction between words and sentences. In this paper, we are the first to propose an unsupervised extractive summarizaiton method with heterogeneous graph embeddings (HGEs) for Chinese document. A heterogeneous text graph is constructed to capture different granularities of interactions by incorporating graph structural information. Moreover, our proposed graph is general and flexible where additional nodes such as keywords can be easily integrated. Experimental results demonstrate that our method consistently outperforms the strong baseline in three summarization datasets.
翻译:在未经监督的抽取总结的情况下,学习高质量的句子表述方式对于从输入文件中选择突出的句子至关重要。 以前的研究更侧重于使用统计方法或预先训练的语言模型(PLMs)来提取句子嵌入,而忽略了单词和句子之间不同类型互动中固有的丰富信息。 在本文中,我们首先为中国文档建议一种未经监督的抽取苏马里萨伊顿方法,并配有混杂的图形嵌入(HGES) 。 构建了一个混杂的文本图表,通过纳入图形结构信息来捕捉不同互动的微粒。 此外,在关键词等额外节点可以容易整合的情况下,我们提议的图表是笼统和灵活的。 实验结果表明,我们的方法始终超越了三个组合数据集的强基线。