Academic research is an exploratory activity to discover new solutions to problems. By this nature, academic research works perform literature reviews to distinguish their novelties from prior work. In natural language processing, this literature review is usually conducted under the "Related Work" section. The task of related work generation aims to automatically generate the related work section given the rest of the research paper and a list of papers to cite. Prior work on this task has focused on the sentence as the basic unit of generation, neglecting the fact that related work sections consist of variable length text fragments derived from different information sources. As a first step toward a linguistically-motivated related work generation framework, we present a Citation Oriented Related Work Annotation (CORWA) dataset that labels different types of citation text fragments from different information sources. We train a strong baseline model that automatically tags the CORWA labels on massive unlabeled related work section texts. We further suggest a novel framework for human-in-the-loop, iterative, abstractive related work generation.
翻译:学术研究是一种探索性活动,旨在找到解决问题的新办法。根据这种性质,学术研究从事文献审查,以区分其与先前工作的新颖性。在自然语言处理中,这种文献审查通常在“相关工作”一节下进行。相关工作生成的任务旨在根据研究论文的其余部分和需要引用的文件清单自动生成相关工作部分。先前关于这项任务的工作侧重于作为基本生成单位的句子,忽视了相关工作部分由不同信息来源的可变长度文本碎片组成这一事实。作为语言驱动的相关工作生成框架的第一步,我们提出了一个引言相关工作说明(CORWA)数据集,其中标注了不同信息来源的不同类型的引用文本碎片。我们开发了一个强大的基线模型,自动标注CORWA标签在大规模无标签相关工作章节的文本上。我们进一步建议为人行、迭接、抽象相关工作生成建立一个新的框架。