Objective: Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: i) develop a natural language processing (NLP) information extraction model to capture detailed SDOH information and ii) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data. Materials and Methods: We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set with 225,089 patients and 430,406 notes with social history sections and compared the extracted SDOH information with existing structured data. Results: The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative. Conclusions: Utilizing EHR data to identify SDOH health risk factors and social needs may improve patient care and outcomes. Semantic representations of text-encoded SDOH information can augment existing structured data, and this more comprehensive SDOH representation can assist health systems in identifying and addressing these social needs.
翻译:摘要:社会健康决定因素(SDOH)会影响健康结果,并通过结构化数据和非结构化临床记录在电子病历(EHR)中进行记录。然而,临床记录通常包含更全面的SDOH信息,详细描述了状态、严重程度和时间性等方面。本研究有两个主要目标:i)开发一种NLP信息提取模型,以捕获详细的SDOH信息;ii)评估将SDOH提取器应用于临床叙述并将提取的表示与现有结构化数据相结合所实现的信息增益。我们使用深度学习实体和关系抽取架构开发了一种新型SDOH提取器,以表征各种维度的SDOH。在EHR案例研究中,我们将SDOH提取器应用于一个具有225,089名患者和430,406个社会历史部分注释的大型临床数据集,并将提取的SDOH信息与现有的结构化数据进行比较。结果:SDOH提取器在保留的测试集上取得了0.86 F1。在EHR案例研究中,我们发现提取的SDOH信息可以补充现有的结构化数据,32%的无家可归者、19%的吸烟者和10%的吸毒者只有这些健康风险因素在临床叙述中得到了记录。结论:利用EHR数据识别SDOH健康风险因素和社会需求可能会改善患者的护理和结果。文本编码的SDOH信息的语义表示可以增强现有的结构化数据,这种更全面的SDOH表示可以帮助卫生系统识别和解决这些社会需求。