In this paper, we bring a new way of digesting news content by introducing the task of segmenting a news article into multiple sections and generating the corresponding summary to each section. We make two contributions towards this new task. First, we create and make available a dataset, SegNews, consisting of 27k news articles with sections and aligned heading-style section summaries. Second, we propose a novel segmentation-based language generation model adapted from pre-trained language models that can jointly segment a document and produce the summary for each section. Experimental results on SegNews demonstrate that our model can outperform several state-of-the-art sequence-to-sequence generation models for this new task.
翻译:在本文中,我们引入了将新闻文章分成多个章节和制作每一章节的相应摘要的任务,从而带来一种新的消化新闻内容的方法。我们为这一新的任务做出了两项贡献。首先,我们创建并提供了一套数据集“SegNews”,由27k篇带有章节的新闻报道和经调整的标题式章节摘要组成。其次,我们提出了一种根据经过培训的语文模式改编的新版分解语言生成模式,该模式可以联合对文件进行分解,并为每一章节编写摘要。“SegNews”的实验结果表明,我们的模式能够超越这一新任务的若干最先进的序列到序列生成模式。