Lay Text Summarisation Using Natural Language Processing: A Narrative Literature Review (Lay Text Summarisation Using Natural Language Processing: A Narrative Literature Review)

Summarisation of research results in plain language is crucial for promoting public understanding of research findings. The use of Natural Language Processing to generate lay summaries has the potential to relieve researchers' workload and bridge the gap between science and society. The aim of this narrative literature review is to describe and compare the different text summarisation approaches used to generate lay summaries. We searched the databases Web of Science, Google Scholar, IEEE Xplore, Association for Computing Machinery Digital Library and arXiv for articles published until 6 May 2022. We included original studies on automatic text summarisation methods to generate lay summaries. We screened 82 articles and included eight relevant papers published between 2020 and 2021, all using the same dataset. The results show that transformer-based methods such as Bidirectional Encoder Representations from Transformers (BERT) and Pre-training with Extracted Gap-sentences for Abstractive Summarization (PEGASUS) dominate the landscape of lay text summarisation, with all but one study using these methods. A combination of extractive and abstractive summarisation methods in a hybrid approach was found to be most effective. Furthermore, pre-processing approaches to input text (e.g. applying extractive summarisation) or determining which sections of a text to include, appear critical. Evaluation metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) were used, which do not consider readability. To conclude, automatic lay text summarisation is under-explored. Future research should consider long document lay text summarisation, including clinical trial reports, and the development of evaluation metrics that consider readability of the lay summary.

翻译：自然语言处理在文本摘要中的应用：一篇叙述性文献综述提取研究成果的简化语言摘要对于促进公众理解研究结果至关重要。使用自然语言处理生成简化语言摘要有助于减轻研究人员的工作负担，弥合科学和社会之间的差距。这篇叙述性文献综述的目的是描述和比较不同的文本摘要方法，以生成简化语言摘要。我们在Web of Science, Google Scholar, IEEE Xplore, Association for Computing Machinery Digital Library和arXiv等数据库中搜索了截至2022年5月6日发表的文章。我们包括了关于使用自动文本摘要方法生成简化语言摘要的原始研究。我们筛选了82篇文章，并收录了8篇有关同一数据集的2020至2021年间的相关论文。结果显示，基于变形器的方法，如双向编码器表示来自变形器（BERT）和基于抽取性总结方法的预训练抽象总结模型（PEGASUS）占据了简化语言摘要领域的主导地位，除一个研究外，所有研究均使用这些方法。综合使用抽取性和抽象性摘要方法的混合方法被发现是效果最好的。此外，对输入文本的预处理方法（例如应用抽取性总结）或确定应包括哪些文本部分似乎非常重要。评估指标，如面向回忆的方案摘要评估（ROUGE），被用来评估摘要质量，但这些指标并不考虑可读性。总之，自动简化语言摘要的研究尚未得到充分探索。未来的研究应考虑长文献的简化语言摘要，包括临床试验报告，并开发考虑可读性的给定摘要质量的评估指标。