Automatic summarization of natural language is a current topic in computer science research and industry, studied for decades because of its usefulness across multiple domains. For example, summarization is necessary to create reviews such as this one. Research and applications have achieved some success in extractive summarization (where key sentences are curated), however, abstractive summarization (synthesis and re-stating) is a hard problem and generally unsolved in computer science. This literature review contrasts historical progress up through current state of the art, comparing dimensions such as: extractive vs. abstractive, supervised vs. unsupervised, NLP (Natural Language Processing) vs Knowledge-based, deep learning vs algorithms, structured vs. unstructured sources, and measurement metrics such as Rouge and BLEU. Multiple dimensions are contrasted since current research uses combinations of approaches as seen in the review matrix. Throughout this summary, synthesis and critique is provided. This review concludes with insights for improved abstractive summarization measurement, with surprising implications for detecting understanding and comprehension in general.
翻译:自然语言的自动总和是计算机科学研究和产业的一个当前议题,由于它具有多个领域的实用性,而这是几十年来研究的一个专题。例如,为了建立这样的审查,有必要进行总和。研究和应用在采掘总和(主要句子经过整理)方面取得了一些成功,然而,抽象总和(合成和重编)是一个棘手的问题,在计算机科学方面一般没有解决。这一文献审查通过目前的工艺状况对比了历史进展,比较了诸如:采掘与抽象的、受监督的和未经监督的、NLP(自然语言处理)与知识基础的、深层次的学习与算法、结构化的与非结构化的源以及诸如红色和BLEU等测量指标。由于目前的研究使用了审查矩阵中所看到的各种方法的组合,因此对多个层面进行了对比。在整个分析中,提供了综合和评论。这一审查最后对改进的抽象总和理解的深入了解,对一般的探测和理解产生了惊人的影响。