In this paper, we exploit the innate document segment structure for improving the extractive summarization task. We build two text segmentation models and find the most optimal strategy to introduce their output predictions in an extractive summarization model. Experimental results on a corpus of scientific articles show that extractive summarization benefits from using a highly accurate segmentation method. In particular, most of the improvement is in documents where the most relevant information is not at the beginning thus, we conclude that segmentation helps in reducing the lead bias problem.
翻译:在本文中,我们利用原始文档部分结构来改进抽取总结任务。我们建立了两个文本分解模型,并找到了最优化的战略,在抽取总结模型中引入输出预测。一系列科学文章的实验结果显示,使用高度准确的分解方法对抽取总结有好处。 特别是,大部分改进是在最相关的信息尚未开始的文件中,我们的结论是,分解有助于减少铅偏差问题。