Continuity of care is crucial to ensuring positive health outcomes for patients discharged from an inpatient hospital setting, and improved information sharing can help. To share information, caregivers write discharge notes containing action items to share with patients and their future caregivers, but these action items are easily lost due to the lengthiness of the documents. In this work, we describe our creation of a dataset of clinical action items annotated over MIMIC-III, the largest publicly available dataset of real clinical notes. This dataset, which we call CLIP, is annotated by physicians and covers 718 documents representing 100K sentences. We describe the task of extracting the action items from these documents as multi-aspect extractive summarization, with each aspect representing a type of action to be taken. We evaluate several machine learning models on this task, and show that the best models exploit in-domain language model pre-training on 59K unannotated documents, and incorporate context from neighboring sentences. We also propose an approach to pre-training data selection that allows us to explore the trade-off between size and domain-specificity of pre-training datasets for this task.
翻译:持续护理对于确保住院病人的健康成果至关重要,改善信息分享可以帮助确保住院病人的健康成果。为了分享信息,护理人员写出包含与病人及其未来护理人员分享行动项目的行动项目的放行说明,但由于文件的篇幅过长,这些行动项目很容易丢失。在这项工作中,我们描述了我们创建的临床行动项目数据集,在最大公开公开的临床说明数据集MIMIC-III上附加说明,这是真正的临床说明的最大数据集。这个数据集,我们称为CLIP,由医生附加说明,涵盖代表100K句的718份文件。我们把从这些文件中提取行动项目的任务描述为多层采掘式总结,每个方面都代表了需要采取的行动类型。我们评估了几项关于这项任务的机器学习模式,并表明最佳模型利用了常规语言模型对59K无附加说明文件的预培训,并纳入了邻近判决的背景。我们还提议了一个培训前数据选择方法,使我们能够探讨培训前数据在任务前的数据设置大小和具体领域之间的取舍。