To assess the effectiveness of any medical intervention, researchers must conduct a time-intensive and highly manual literature review. NLP systems can help to automate or assist in parts of this expensive process. In support of this goal, we release MS^2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature. This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain. We experiment with a summarization system based on BART, with promising early results. We formulate our summarization inputs and targets in both free text and structured forms and modify a recently proposed metric to assess the quality of our system's generated summaries. Data and models are available at https://github.com/allenai/ms2
翻译:为了评估任何医疗干预的有效性,研究人员必须进行时间密集和高度人工的文献审查。国家实验室方案系统可以帮助在这一昂贵过程的某些部分实现自动化或提供协助。为了支持这一目标,我们以免费文本和结构化格式发布MS2(医学研究多文件摘要)、470k多份文件数据集和从科学文献中得出的20k摘要。这一数据集有助于发展能够评估和汇总多种研究之间相互矛盾证据的系统,也是生物医学领域第一个大规模、可公开获取的多文件汇总数据集。我们试验基于BART的汇总系统,并取得有希望的早期结果。我们以免费文本和结构化形式制定我们的汇总投入和目标,并修改最近提出的评估我们系统生成摘要质量的一套指标。数据和模型见https://github.com/allenai/ms2。数据和模型见https://github.allenai/ms2。