Abstractive summarization is the process of generating novel sentences based on the information extracted from the original text document while retaining the context. Due to abstractive summarization's underlying complexities, most of the past research work has been done on the extractive summarization approach. Nevertheless, with the triumph of the sequence-to-sequence (seq2seq) model, abstractive summarization becomes more viable. Although a significant number of notable research has been done in the English language based on abstractive summarization, only a couple of works have been done on Bengali abstractive news summarization (BANS). In this article, we presented a seq2seq based Long Short-Term Memory (LSTM) network model with attention at encoder-decoder. Our proposed system deploys a local attention-based model that produces a long sequence of words with lucid and human-like generated sentences with noteworthy information of the original document. We also prepared a dataset of more than 19k articles and corresponding human-written summaries collected from bangla.bdnews24.com1 which is till now the most extensive dataset for Bengali news document summarization and publicly published in Kaggle2. We evaluated our model qualitatively and quantitatively and compared it with other published results. It showed significant improvement in terms of human evaluation scores with state-of-the-art approaches for BANS.
翻译:抽象的总结是基于原始文本文件所摘取的信息,同时保留上下文而生成新句子的过程。由于抽象的总结其内在的复杂性,过去的大部分研究工作都是在采掘式总结方法上完成的。然而,随着顺序到顺序模式(seq2seq)的胜利,抽象的总结更为可行。虽然在抽象总结的基础上用英文进行了大量值得注意的研究,但只完成了几部关于孟加拉抽象新闻摘要化(BANS)的著作。在文章中,我们介绍了一个基于后继的长短期记忆(LSTM)网络模型,在解码器-解码器中引起注意。我们提议的系统采用了一个基于当地注意的模式,该模型生成了一段长的词序,其清晰而人性化的句子,其原始文件有值得注意的信息。我们还编制了一套超过19k篇文章的数据集,以及从Bangla.bdnews24.com1中收集的相应的人文摘要。迄今为止,我们用最广泛的数据模型展示了以Bengla-S质量方法进行最广泛的数据设置,并用我们出版的定性文件质量分析结果显示我们所出版的Ka-vial-vialalalalalalalalalalalalalalal-salal-view 和公开评价。