Existing summarization systems mostly generate summaries purely relying on the content of the source document. However, even for humans, we usually need some references or exemplars to help us fully understand the source document and write summaries in a particular format. But how to find the high-quality exemplars and incorporate them into summarization systems is still challenging and worth exploring. In this paper, we propose RetrievalSum, a novel retrieval enhanced abstractive summarization framework consisting of a dense Retriever and a Summarizer. At first, several closely related exemplars are retrieved as supplementary input to help the generation model understand the text more comprehensively. Furthermore, retrieved exemplars can also play a role in guiding the model to capture the writing style of a specific corpus. We validate our method on a wide range of summarization datasets across multiple domains and two backbone models: BERT and BART. Results show that our framework obtains significant improvement by 1.38~4.66 in ROUGE-1 score when compared with the powerful pre-trained models, and achieve new state-of-the-art on BillSum. Human evaluation demonstrates that our retrieval enhanced model can better capture the domain-specific writing style.
翻译:现有汇总系统大多产生纯粹依赖源文件内容的摘要。 但是,即使对于人类,我们通常也需要一些参考或示例,以帮助我们完全理解源文件和以特定格式写摘要。但是,如何找到高质量的示例并把它们纳入汇总系统仍然具有挑战性,值得探索。在本文件中,我们提议检索系统,这是一个新型的检索强化抽象汇总框架,由密集的检索器和苏玛瑞器组成。首先,一些密切相关的示例作为补充投入被检索出来,以帮助新一代模型更全面地理解文本。此外,检索的示例也可以在指导模型以捕捉某个特定物理体的写作风格方面发挥作用。我们验证了我们关于多个领域和两个主干模型(BERT和BARRT)的广泛汇总数据集的方法。结果显示,与强大的预培训模型相比,我们的框架在ROUGE-1得分方面得到显著改进1.38~4.66分,并实现了新的版本的BISum格式。 人类评估显示,我们增强的检索模型可以更好地采集。