A primary objective of news articles is to establish the factual record for an event, frequently achieved by conveying both the details of the specified event (i.e., the 5 Ws; Who, What, Where, When and Why regarding the event) and how people reacted to it (i.e., reported statements). However, existing work on news summarization almost exclusively focuses on the event details. In this work, we propose the novel task of summarizing the reactions of different speakers, as expressed by their reported statements, to a given event. To this end, we create a new multi-document summarization benchmark, SUMREN, comprising 745 summaries of reported statements from various public figures obtained from 633 news articles discussing 132 events. We propose an automatic silver training data generation approach for our task, which helps smaller models like BART achieve GPT-3 level performance on this task. Finally, we introduce a pipeline-based framework for summarizing reported speech, which we empirically show to generate summaries that are more abstractive and factual than baseline query-focused summarization approaches.
翻译:新闻文章的一个主要目标是为某一事件建立事实记录,经常通过传达特定事件的细节(即5 Ws;谁、谁、在哪里、何时和为什么与事件有关)和人们如何对事件作出反应(即报道声明)来建立事件的事实记录;然而,关于新闻摘要的现有工作几乎完全侧重于事件的细节;在这项工作中,我们提议一项新颖的任务,即总结不同发言者以其所报道的发言表达的对某一事件的反应;为此目的,我们建立了一个新的多文件摘要化基准,SUMREN, 包括从讨论132个事件的633篇新闻文章中获得的745份公开数字报告的声明摘要;我们建议了我们的任务自动生成银色培训数据的方法,帮助小型模型如BART实现GPT-3级工作业绩;最后,我们为总结所报道的演讲而提出了一个基于管道的框架,我们从经验上展示了这种框架,以产生摘要比以质询为重点的基线总结方法更抽象、更符合事实。