Automatic text summarization has been widely studied as an important task in natural language processing. Traditionally, various feature engineering and machine learning based systems have been proposed for extractive as well as abstractive text summarization. Recently, deep learning based, specifically Transformer-based systems have been immensely popular. Summarization is a cognitively challenging task - extracting summary worthy sentences is laborious, and expressing semantics in brief when doing abstractive summarization is complicated. In this paper, we specifically look at the problem of summarizing scientific research papers from multiple domains. We differentiate between two types of summaries, namely, (a) LaySumm: A very short summary that captures the essence of the research paper in layman terms restricting overtly specific technical jargon and (b) LongSumm: A much longer detailed summary aimed at providing specific insights into various ideas touched upon in the paper. While leveraging latest Transformer-based models, our systems are simple, intuitive and based on how specific paper sections contribute to human summaries of the two types described above. Evaluations against gold standard summaries using ROUGE metrics prove the effectiveness of our approach. On blind test corpora, our system ranks first and third for the LongSumm and LaySumm tasks respectively.
翻译:在自然语言处理中,已广泛研究自动文本总和是一项重要任务。传统上,为提取和抽象文本总和,提出了各种特征工程和机器学习系统,这是自然语言处理中的一项重要任务。最近,基于深层次的学习,特别是基于变异器的系统非常受欢迎。概括是一项具有认知性的挑战性的任务——摘出摘要有价值的句子是一项艰巨的任务,在进行抽象总结时简短地表达语义很复杂。在本文件中,我们特别研究从多个领域总结科学研究论文的问题。我们区分了两类摘要:(a) LaySumm:一个非常简短的摘要,从外文术语中捕捉到研究论文的精髓,限制过于具体的技术术语的变异器;(b) LongSumm:一个长得多的详细摘要,旨在对文件中涉及的各种想法提供具体见解。在利用最新的变异器模型时,我们的系统是简单、直截了当的,并且基于具体的文件章节如何有助于上述两种类型的人类摘要。我们对黄金标准摘要的评价,使用ROUGE第三类和第三类指标证明了我们的方法的有效性。