We provide a literature review about Automatic Text Summarization (ATS) systems. We consider a citation-based approach. We start with some popular and well-known papers that we have in hand about each topic we want to cover and we have tracked the "backward citations" (papers that are cited by the set of papers we knew beforehand) and the "forward citations" (newer papers that cite the set of papers we knew beforehand). In order to organize the different methods, we present the diverse approaches to ATS guided by the mechanisms they use to generate a summary. Besides presenting the methods, we also present an extensive review of the datasets available for summarization tasks and the methods used to evaluate the quality of the summaries. Finally, we present an empirical exploration of these methods using the CNN Corpus dataset that provides golden summaries for extractive and abstractive methods.
翻译:我们提供了关于自动文本摘要(ATS)系统的文献审查。我们考虑一种以引用为基础的方法。我们从我们手头掌握的关于我们想要涵盖的每个主题的一些广受欢迎的和众所周知的文件开始,我们追踪了“背向引文”(我们事先知道的一组文件所引用的文件)和“前向引文”(引用我们以前所知道的一套文件的新文件)和“前向引文”等“背向引文”和“前向引文”(我们引用了我们事先知道的一套文件的新文件)。为了组织不同的方法,我们介绍了以它们用来产生摘要的机制为指导的对苯丙胺类兴奋剂的不同方法。除了介绍方法外,我们还广泛审查了可用于总结任务的数据集以及用于评估摘要质量的方法。最后,我们介绍了利用CNN Corpus数据集对这些方法进行的经验探索,该数据集为采掘和抽象方法提供了黄金摘要。