Document Summarization is the procedure of generating a meaningful and concise summary of a given document with the inclusion of relevant and topic-important points. There are two approaches: one is picking up the most relevant statements from the document itself and adding it to the Summary known as Extractive and the other is generating sentences for the Summary known as Abstractive Summarization. Training a machine learning model to perform tasks that are time-consuming or very difficult for humans to evaluate is a major challenge. Book Abstract generation is one of such complex tasks. Traditional machine learning models are getting modified with pre-trained transformers. Transformer based language models trained in a self-supervised fashion are gaining a lot of attention; when fine-tuned for Natural Language Processing(NLP) downstream task like text summarization. This work is an attempt to use Transformer based techniques for Abstract generation.
翻译:文件摘要是生成一份有意义和简明的文件概要的程序,其中要包含相关和重要专题要点。有两种办法:一种是从文件本身中收集最相关的说明,将其添加到称为“摘要摘要”的概要中,另一种是给摘要(称为“摘要摘要摘要”)带来句子。培训机器学习模式,以完成耗费时间或对人类来说很难评估的任务是一项重大挑战。书摘要的生成是这类复杂任务之一。传统机器学习模式正在随着培训前的变压器而改变。以自我监督方式培训的基于变换语言模型正在引起人们的极大关注;在对自然语言处理(NLP)下游任务(如文本汇总)进行微调时,这项工作试图利用变换器技术来生成摘要。