The goal of text summarization is to compress documents to the relevant information while excluding background information already known to the receiver. So far, summarization researchers have given considerably more attention to relevance than to background knowledge. In contrast, this work puts background knowledge in the foreground. Building on the realization that the choices made by human summarizers and annotators contain implicit information about their background knowledge, we develop and compare techniques for inferring background knowledge from summarization data. Based on this framework, we define summary scoring functions that explicitly model background knowledge, and show that these scoring functions fit human judgments significantly better than baselines. We illustrate some of the many potential applications of our framework. First, we provide insights into human information importance priors. Second, we demonstrate that averaging the background knowledge of multiple, potentially biased annotators or corpora greatly improves summary-scoring performance. Finally, we discuss potential applications of our framework beyond summarization.
翻译:文本摘要的目的是将文件压缩到相关信息中,而排除接收者已经知道的背景资料。到目前为止,汇总研究者比背景知识更加关注相关性,与此相反,这项工作将背景知识置于视野之下。认识到人类总结者和说明者所作的选择含有关于其背景知识的隐含信息,我们开发并比较了从汇总数据中推断背景知识的技术。根据这个框架,我们界定了明确模拟背景知识的简要评分功能,并表明这些评分功能与人类的判断相匹配,大大优于基线。我们举例说明了我们框架的许多潜在应用。首先,我们提供了关于人类信息重要性的洞察力。第二,我们证明,共享多个可能存在偏向性的批注者或公司的背景知识可以极大地改善汇总工作绩效。最后,我们讨论了我们框架的潜在应用,超越了汇总。