PMC-Patients: 一组大规模病人总结和关系数据集，用于评估基于检索的临床决策支持系统的性能 (PMC-Patients: A Large-scale Dataset of Patient Summaries and Relations for Benchmarking Retrieval-based Clinical Decision Support Systems)

Objective: Retrieval-based Clinical Decision Support (ReCDS) can aid clinical workflow by providing relevant literature and similar patients for a given patient. However, the development of ReCDS systems has been severely obstructed by the lack of diverse patient collections and publicly available large-scale patient-level annotation datasets. In this paper, we aim to define and benchmark two ReCDS tasks: Patient-to-Article Retrieval (ReCDS-PAR) and Patient-to-Patient Retrieval (ReCDS-PPR) using a novel dataset called PMC-Patients. Methods: We extract patient summaries from PubMed Central articles using simple heuristics and utilize the PubMed citation graph to define patient-article relevance and patient-patient similarity. We also implement and evaluate several ReCDS systems on the PMC-Patients benchmarks, including sparse retrievers, dense retrievers, and nearest neighbor retrievers. We conduct several case studies to show the clinical utility of PMC-Patients. Results: PMC-Patients contains 167k patient summaries with 3.1M patient-article relevance annotations and 293k patient-patient similarity annotations, which is the largest-scale resource for ReCDS and also one of the largest patient collections. Human evaluation and analysis show that PMC-Patients is a diverse dataset with high-quality annotations. The evaluation of various ReCDS systems shows that the PMC-Patients benchmark is challenging and calls for further research. Conclusion: We present PMC-Patients, a large-scale, diverse, and publicly available patient summary dataset with the largest-scale patient-level relation annotations. Based on PMC-Patients, we formally define two benchmark tasks for ReCDS systems and evaluate various existing retrieval methods. PMC-Patients can largely facilitate methodology research on ReCDS systems and shows real-world clinical utility.

翻译：目标：基于检索的临床决策支持系统（ReCDS）可以通过提供相关文献和相似的病人来帮助临床工作流程。但是，由于缺乏多样化的病人集合和公开的大规模患者级别注释数据集，ReCDS系统的开发受到了严重阻碍。在本文中，我们旨在利用名为PMC-Patients的新型数据集来定义和评估两个ReCDS任务：患者-文章检索（ReCDS-PAR）和患者-患者检索（ReCDS-PPR）。方法：我们使用简单的启发式方法从PubMed Central文章中提取病人总结，利用PubMed引文图定义病人-文章关联和病人-病人相似性。我们还在PMC-Patients基准测试中实现和评估了几个ReCDS系统，包括稀疏检索器、密集检索器和最近邻居检索器。我们还进行了几个案例研究，展示了PMC-Patients的临床效用。结果：PMC-Patients包含了167k个病人总结，3.1M个病人-文章关联注释和293k个病人-病人相似性注释，是用于ReCDS的最大规模资源之一，也是最大规模的病人集合之一。人工评估和分析表明，PMC-Patients是一个多样化数据集，具有高质量的注释。各种ReCDS系统的评估表明，PMC-Patients基准测试具有挑战性，并呼吁进一步研究。结论：我们提出PMC-Patients，这是一个大规模、多样化、公开的病人总结数据集，具有最大规模的病人级别关系注释。基于PMC-Patients，我们正式定义了两个ReCDS系统的基准任务，并评估了各种现有检索方法。PMC-Patients可以极大地促进ReCDS系统的方法研究，并显示实际临床效用。

相关内容

PMC

关注 0

《普适与移动计算期刊》（PMC）是一本高影响力、同行评议的技术期刊，它发表了高质量的科学文章，涵盖了普适与移动计算和系统的所有方面。官网链接：https://www.sciencedirect.com/journal/pervasive-and-mobile-computing/about/aims-and-scope

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知会员服务

41+阅读 · 2020年10月13日