Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set. With the rapid development of online platforms for generating and distributing text-rich documents, there arises an urgent need for continuously summarizing dynamically evolving multi-document sets where the composition of documents and sets is changing over time. This is especially challenging as the summarization should be not only effective in incorporating relevant, novel, and distinctive information from each concurrent multi-document set, but also efficient in serving online applications. In this work, we propose a new summarization problem, Evolving Multi-Document sets stream Summarization (EMDS), and introduce a novel unsupervised algorithm PDSum with the idea of prototype-driven continuous summarization. PDSum builds a lightweight prototype of each multi-document set and exploits it to adapt to new documents while preserving accumulated knowledge from previous documents. To update new summaries, the most representative sentences for each multi-document set are extracted by measuring their similarities to the prototypes. A thorough evaluation with real multi-document sets streams demonstrates that PDSum outperforms state-of-the-art unsupervised multi-document summarization algorithms in EMDS in terms of relevance, novelty, and distinctiveness and is also robust to various evaluation settings.
翻译:文献中长期研究了内容丰富的文件,但大部分现有努力都是为了总结一个静态和预先定义的多文件集。随着生成和分发内容丰富的文件的在线平台的迅速发展,迫切需要不断动态地总结正在动态变化的多文件集,因为文件和集的组成随着时间的推移而变化。这尤其具有挑战性,因为汇总不仅应有效地纳入每个同时同时编写的多文件集的相关、新颖和独特信息,而且应有效地为在线应用程序服务。在这项工作中,我们提出了一个新的组合问题,即:动态多文件集流(EMDS),并引入了新型的、不受监督的、带有原型驱动的连续合成概念的PDSum算法。PDSum构建了每个多文件集的轻量级原型,利用它适应新文件,同时保留从以往文件中积累的知识。为了更新新摘要,每个多文件集最有代表性的句子,通过测量其与原型的相似性来提取。用真实的多文件集流进行彻底评估,显示PDS-DS-S-S-S-S-S-SUniversal-assimal-assimal-assimal-assimal-assimal-smaphal-s-simpal-smamas-s-s-simpal-simpal-s-s-smamadal-s-s-s-s-s-s-smadsmasmas-s-s-s-s-s-smas-s-s-s-s-s-s-smad-s-smad-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-sma-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-