Narrative summarization aims to produce a distilled version of a narrative to describe its most salient events and characters. Summarizing a narrative is challenging as it requires an understanding of event causality and character behaviors. To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset. It contains 122K narrative documents, which are collected from plot descriptions of movies and TV episodes with diverse genres, and their corresponding abstractive summaries. Experiments show that there is a large performance gap between humans and the state-of-the-art summarization models on NarraSum. We hope that this dataset will promote future research in summarization, as well as broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.
翻译:叙事总和旨在产生一个精练的叙述版本,以描述其最突出的事件和字符。叙事总和具有挑战性,因为它需要了解事件的因果关系和性格行为。为了鼓励这方面的研究,我们提议一个大型叙事总和数据集NarraSum,这是一个大型叙事总和数据集。它包含122K叙述性文件,从不同类型电影和电视剧的情节描述中收集,及其相应的抽象摘要。实验显示,在纳拉苏姆,人与最先进的综合模型之间存在巨大的性能差距。我们希望,这一数据集将促进今后关于总和的研究,以及更广义的关于自然语言理解和生成的研究。数据集可在https://github.com/zhoochaocs/narrasum上查阅。