Most current extractive summarization models generate summaries by selecting salient sentences. However, one of the problems with sentence-level extractive summarization is that there exists a gap between the human-written gold summary and the oracle sentence labels. In this paper, we propose to extract fact-level semantic units for better extractive summarization. We also introduce a hierarchical structure, which incorporates the multi-level of granularities of the textual information into the model. In addition, we incorporate our model with BERT using a hierarchical graph mask. This allows us to combine BERT's ability in natural language understanding and the structural information without increasing the scale of the model. Experiments on the CNN/DaliyMail dataset show that our model achieves state-of-the-art results.
翻译:目前大多数采掘总结模型通过选择突出的句子产生摘要。然而,在判决一级采掘总结中,一个问题在于,在人写黄金摘要和甲骨文句标签之间存在差距。在本文中,我们建议提取事实级别的语义单位,以更好地提取总结。我们还引入一个等级结构,将文本信息的多层次颗粒纳入模型。此外,我们用一个等级图面罩将我们的模型与BERT合并在一起。这使我们能够将生物和毒素专家组织在自然语言理解和结构信息方面的能力结合起来,而不扩大模型的规模。对CNN/DalyMail数据集的实验显示,我们的模型取得了最新的结果。