Summarization systems face the core challenge of identifying and selecting important information. In this paper, we tackle the problem of content selection in unsupervised extractive summarization of long, structured documents. We introduce a wide range of heuristics that leverage cognitive representations of content units and how these are retained or forgotten in human memory. We find that properties of these representations of human memory can be exploited to capture relevance of content units in scientific articles. Experiments show that our proposed heuristics are effective at leveraging cognitive structures and the organization of the document (i.e.\ sections of an article), and automatic and human evaluations provide strong evidence that these heuristics extract more summary-worthy content units.
翻译:总结系统面临识别和选择重要信息的核心挑战。在本文中,我们解决了在未经监督的情况下对长期、结构化文件进行抽取总结的内容选择问题。我们引入了广泛的累进论,利用内容单位的认知表现以及这些单位如何在人类记忆中被保留或遗忘。我们发现,这些人类记忆表达的特性可以用来捕捉科学文章中内容单位的相关性。实验表明,我们提议的超进论有效地利用了认知结构和文件的组织(即文章的\\部分),而自动和人类评价提供了有力的证据,证明这些超进论提取了更适合摘要的内容单位。