Objective. The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically. Materials and Methods. The study uses the N2C2 Shared Task data, which was collected from two sources of clinical notes: MIMIC-III and University of Washington Harborview Medical Centers. It contains 4480 social history sections with full annotation for twelve SDoHs. In order to handle the issue of overlapping entities, we developed a novel marker-based NER model. We used it in a multi-stage pipeline to extract SDoH information from clinical notes. Results. Our marker-based system outperformed the state-of-the-art span-based models at handling overlapping entities based on the overall Micro-F1 score performance. It also achieved state-of-the-art performance compared to the shared task methods. Conclusion. The major finding of this study is that the multi-stage pipeline effectively extracts SDoH information from clinical notes. This approach can potentially improve the understanding and tracking of SDoHs in clinical settings. However, error propagation may be an issue, and further research is needed to improve the extraction of entities with complex semantic meanings and low-resource entities using external knowledge.
翻译:目标:健康的社会决定因素(SDoH)对病人保健质量和差异的影响是众所周知的。许多SDoH项目没有在电子健康记录中以结构化形式编码。这些物品往往被记录在免费文本临床说明中,但自动提取的方法有限。我们探索一个多阶段管道,涉及名称实体识别(NER)、关系分类(RC)和文本分类方法,以自动从临床说明中提取SDoH信息。材料和方法。研究使用N2C2共享任务数据,该数据来自两个临床说明来源:MIMIC-III和华盛顿港景大学医疗中心。它包含有4 480个社会历史部分,有12个SDoH的完整注解。为了处理重叠实体的问题,我们开发了一个新的基于标记的NER模型。我们在多阶段管道中使用了它从临床说明中提取SDoH信息的方法。我们基于标记的系统在处理基于总体MMF1评分业绩的重叠实体时,超越了基于最先进的跨模型模型。它包含480个社会历史部分,它也实现了对SD-H的深度研究的深度分析, 并且可以有效地改进S-hal-hrealdeal-haldeal-h 和S-haldeal-haldealdealde 。这是一个需要的深度研究,可以改进S-h-h-deal-de-h-h-h-h-h-de-dealdealdealde-dealdealde-de-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-toaldealdaltrade-hide-hi-hi-hidealdald-hi-de-hi-hi-de-hi-de-de-de-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-de-de-de-hi-hi-de-de-de-hi-de-de-de-de-de-de-de-hi-hi-de-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi-hi