编制许多学术论文的结构摘要:数据集和方法 (Generating a Structured Summary of Numerous Academic Papers: Dataset and Method)

Writing a survey paper on one research topic usually needs to cover the salient content from numerous related papers, which can be modeled as a multi-document summarization (MDS) task. Existing MDS datasets usually focus on producing the structureless summary covering a few input documents. Meanwhile, previous structured summary generation works focus on summarizing a single document into a multi-section summary. These existing datasets and methods cannot meet the requirements of summarizing numerous academic papers into a structured summary. To deal with the scarcity of available data, we propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic. We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents. To organize the diverse content from dozens of input documents and ensure the efficiency of processing long text sequences, we propose a summarization method named category-based alignment and sparse transformer (CAST). The experimental results show that our CAST method outperforms various advanced summarization methods.

翻译：就一个研究专题撰写一份调查文件通常需要涵盖许多相关文件的突出内容,这些相关文件可以模拟为多文件总结(MDS)任务。现有的MDS数据集通常侧重于制作包含一些投入文件的无结构摘要。与此同时,先前的结构化摘要生成工作侧重于将单一文件归纳为多部分摘要。这些现有的数据集和方法无法满足将众多学术文件总结为结构化摘要的要求。为了处理现有数据稀缺的问题,我们提议BigSuvey,这是编制关于每个专题的众多学术文件综合摘要的第一个大型数据集。我们从7 000多份调查文件中收集目标性摘要,并将其4万份参考文件摘要用作投入文件。为了组织来自几十份投入文件的不同内容,并确保长文本序列处理的效率,我们建议了一个名为分类对齐和分散变异器的汇总方法。实验结果表明,我们的CAST方法比各种先进的总结方法要好得多。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日