Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately \emph{synthesize} inputs with respect to a key property or aspect. For example, a synopsis of film reviews all written about a particular movie should reflect the average critic consensus. As a more consequential example, consider narrative summaries that accompany biomedical \emph{systematic reviews} of clinical trial results. These narratives should fairly summarize the potentially conflicting results from individual trials. In this paper we ask: To what extent do modern multi-document summarization models implicitly perform this type of synthesis? To assess this we perform a suite of experiments that probe the degree to which conditional generation models trained for summarization using standard methods yield outputs that appropriately synthesize inputs. We find that existing models do partially perform synthesis, but do so imperfectly. In particular, they are over-sensitive to changes in input ordering and under-sensitive to changes in input compositions (e.g., the ratio of positive to negative movie reviews). We propose a simple, general method for improving model synthesis capabilities by generating an explicitly diverse set of candidate outputs, and then selecting from these the string best aligned with the expected aggregate measure for the inputs, or \emph{abstaining} when the model produces no good candidate. This approach improves model synthesis performance. We hope highlighting the need for synthesis (in some summarization settings), motivates further research into multi-document summarization methods and learning objectives that explicitly account for the need to synthesize.
翻译:多文档总和化要求对收集的投入进行简明扼要的合成。 对于某些应用程序, 概要应该精确地对关键属性或方面进行\ emph{ 合成大小} 投入进行精确的剖析。 例如, 有关特定电影的电影审查概要应该反映平均批评意见的共识。 更恰当的例子是, 考虑伴随临床试验结果的生物医学\ emph{ 系统审查的叙述性概述。 这些叙述性说明应该公正地总结个别试验的潜在冲突结果。 在本文中, 我们问 : 现代多文档总和模型在多大程度上隐含了这种综合的操作? 要评估这个实验, 我们进行一系列实验, 考察使用标准方法对有条件的生成模型进行总结培训的程度, 以产生产出的适当合成结果。 我们发现, 现有的模型可以部分地进行综合, 但这样做不完美。 特别是, 这些模型对投入的顺序变化过于敏感, 对投入构成的变化也不够敏感( 例如, 正面与负面的合成分析的比率) 。 我们提出一个简单、 一般性的方法, 改进模型综合能力,, 将一个明确的学习的缩略的缩略的缩缩缩缩图,, 选择这些的缩缩图 的缩图 的缩图 。