Text summarization is a user-preference based task, i.e., for one document, users often have different priorities for summary. As a key aspect of customization in summarization, granularity is used to measure the semantic coverage between the summary and source document. However, developing systems that can generate summaries with customizable semantic coverage is still an under-explored topic. In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum. We take events as the basic semantic units of the source documents and propose to rank these events by their salience. We also develop a model to summarize input documents with given events as anchors and hints. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner. Meanwhile, we annotate a new benchmark GranuDUC that contains multiple summaries at different granularities for each document cluster. Experimental results confirm the substantial superiority of GranuSum on multi-granularity summarization over strong baselines. Further, by exploiting the event information, GranuSum also exhibits state-of-the-art performance under the conventional unsupervised abstractive setting. Dataset for this paper can be found at: https://github.com/maszhongming/GranuDUC
翻译:文本总和是一个基于用户偏好的任务, 也就是说, 对于一个文件, 用户通常有不同的摘要优先事项。 作为总和定制的关键方面, 使用颗粒度来测量摘要和源文件之间的语义覆盖。 但是, 开发能够生成可定制的语义覆盖摘要的系统, 仍然是一个探索不足的话题 。 在本文中, 我们提出第一个未经监管的多语义总和框架 GranuSum 。 我们把事件作为源文件的基本语义单位, 并提议将这些事件按其亮度排序 。 我们还开发了一个模型, 以特定事件作为锚点和提示来总结输入文件的语义范围。 但是, GranuSum 可以通过输入不同数量的事件来生成可定制的语义覆盖性摘要 。 同时, 我们提出一个新的基准 GranuDUC, 包含每个文件群不同颗粒的多个摘要 。 实验结果证实 GranuSum 具有多个语系对多语系的语义性比强的精度比强度。 我们还开发一个模式, 将常规的演示活动 建立在硬质/ Clubus- 基线下, 。 建立常规数据库 数据库 数据 。