Text summarization is a personalized and customized task, i.e., for one document, users often have different preferences for the summary. As a key aspect of customization in summarization, granularity is used to measure the semantic coverage between summary and source document. Coarse-grained summaries can only contain the most central event in the original text, while fine-grained summaries cover more sub-events and corresponding details. However, previous studies mostly develop systems in the single-granularity scenario. And models that can generate summaries with customizable semantic coverage still remain an under-explored topic. In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum. We take events as the basic semantic units of the source documents and propose to rank these events by their salience. We also develop a model to summarize input documents with given events as anchors and hints. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner. Meanwhile, to evaluate multi-granularity summarization models, we annotate a new benchmark GranuDUC, in which we write multiple summaries of different granularities for each document cluster. Experimental results confirm the substantial superiority of GranuSum on multi-granularity summarization over several baseline systems. Furthermore, by experimenting on conventional unsupervised abstractive summarization tasks, we find that GranuSum, by exploiting the event information, can also achieve new state-of-the-art results under this scenario, outperforming strong baselines.
翻译:文本总和是一个个性化和定制的任务, 也就是说, 对于一个文档来说, 用户通常对摘要有不同的偏好。 作为总和定制的关键方面, 使用颗粒度来测量摘要和源文档之间的语义覆盖。 粗略的概要只能包含原始文档中最核心的事件, 而细微的概要则包含更多的次事件和相应的细节 。 但是, 以前的研究大多是在单一色度假设情景中开发系统 。 能够生成可定制的语义覆盖摘要的模型仍然是一个未得到充分探讨的专题 。 在本文中, 我们建议使用第一个非超过多色多色多色的多色性合成框架框架框架框架 。 我们将事件作为原始文档的基本语义单位, 并提议用其突出的文字和提示来对这些事件进行排序。 但是, GranuSum 可以通过输入不同数量的事件来生成多色度的抽象摘要 。 以不精确的方式生成多色化的多色度的状态摘要 。 同时, 我们用每个新基度的基质的模型来评估多度的基质的基质的基度 。