Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts. Automatic approaches for lay summarisation can provide significant value in broadening access to scientific literature, enabling a greater degree of both interdisciplinary knowledge sharing and public understanding when it comes to research findings. However, current corpora for this task are limited in their size and scope, hindering the development of broadly applicable data-driven approaches. Aiming to rectify these issues, we present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale), each of which contains biomedical journal articles alongside expert-written lay summaries. We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets that can be leveraged to support the needs of different applications. Finally, we benchmark our datasets using mainstream summarisation approaches and perform a manual evaluation with domain experts, demonstrating their utility and casting light on the key challenges of this task.
翻译:简略归纳的目的是共同总结和简化某一文本,从而使非专家更容易理解其内容。简单总结的自动方法可以为扩大科学文献的获取范围提供重要价值,从而在研究结果方面实现更大程度的跨学科知识共享和公众理解。然而,目前用于这项任务的社团规模和范围有限,妨碍了广泛适用的数据驱动方法的开发。为了纠正这些问题,我们提出了两个新颖的简略汇总数据集,即PLOS(大尺度)和电子生活(中尺度),其中每一个都载有生物医学期刊文章以及专家撰写的简略摘要。我们为我们的简略摘要提供了透彻的特征,突出了可用于支持不同应用需要的数据集之间不同的可读性和抽象性。最后,我们用主流汇总方法衡量我们的数据集,并与域专家一道进行人工评估,展示其效用,并展示了这项任务的主要挑战。