CLIP: 从医院放生说明中提取医生行动项目的数据集 (CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes)

Continuity of care is crucial to ensuring positive health outcomes for patients discharged from an inpatient hospital setting, and improved information sharing can help. To share information, caregivers write discharge notes containing action items to share with patients and their future caregivers, but these action items are easily lost due to the lengthiness of the documents. In this work, we describe our creation of a dataset of clinical action items annotated over MIMIC-III, the largest publicly available dataset of real clinical notes. This dataset, which we call CLIP, is annotated by physicians and covers 718 documents representing 100K sentences. We describe the task of extracting the action items from these documents as multi-aspect extractive summarization, with each aspect representing a type of action to be taken. We evaluate several machine learning models on this task, and show that the best models exploit in-domain language model pre-training on 59K unannotated documents, and incorporate context from neighboring sentences. We also propose an approach to pre-training data selection that allows us to explore the trade-off between size and domain-specificity of pre-training datasets for this task.

翻译：持续护理对于确保住院病人的健康成果至关重要,改善信息分享可以帮助确保住院病人的健康成果。为了分享信息,护理人员写出包含与病人及其未来护理人员分享行动项目的行动项目的放行说明,但由于文件的篇幅过长,这些行动项目很容易丢失。在这项工作中,我们描述了我们创建的临床行动项目数据集,在最大公开公开的临床说明数据集MIMIC-III上附加说明,这是真正的临床说明的最大数据集。这个数据集,我们称为CLIP,由医生附加说明,涵盖代表100K句的718份文件。我们把从这些文件中提取行动项目的任务描述为多层采掘式总结,每个方面都代表了需要采取的行动类型。我们评估了几项关于这项任务的机器学习模式,并表明最佳模型利用了常规语言模型对59K无附加说明文件的预培训,并纳入了邻近判决的背景。我们还提议了一个培训前数据选择方法,使我们能够探讨培训前数据在任务前的数据设置大小和具体领域之间的取舍。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【斯坦福】从电子病历EHR构建知识图谱，Robustly Extracting Medical Knowledge from EHRs:A Case Study of Learning a Health Knowledge Graph

专知会员服务

56+阅读 · 2020年6月2日

专知会员服务

170+阅读 · 2020年5月10日