Most research studying social determinants of health (SDoH) has focused on physician notes or structured elements of the electronic medical record (EMR). We hypothesize that clinical notes from social workers, whose role is to ameliorate social and economic factors, might provide a richer source of data on SDoH. We sought to perform topic modeling to identify robust topics of discussion within a large cohort of social work notes. We retrieved a diverse, deidentified corpus of 0.95 million clinical social work notes from 181,644 patients at the University of California, San Francisco. We used word frequency analysis and Latent Dirichlet Allocation (LDA) topic modeling analysis to characterize this corpus and identify potential topics of discussion. Word frequency analysis identified both medical and non-medical terms associated with specific ICD10 chapters. The LDA topic modeling analysis extracted 11 topics related to social determinants of health risk factors including financial status, abuse history, social support, risk of death, and mental health. In addition, the topic modeling approach captured the variation between different types of social work notes and across patients with different types of diseases or conditions. We demonstrated that social work notes contain rich, unique, and otherwise unobtainable information on an individual's SDoH.
翻译:研究健康的社会决定因素(SDoH)的大多数研究都集中在医生笔记或电子医疗记录的结构化要素上。我们假设社会工作者的临床笔记(其作用是改善社会和经济因素)可以提供更丰富的关于SDoH的数据来源。我们试图进行主题模型化,以确定在一大批社会工作笔记中进行讨论的有力主题。我们从加利福尼亚大学旧金山分校的181 644名病人中检索到一个不同、分辨的总数为95万份的临床社会工作说明。我们用文字频率分析和Litetn Dirichlet分配(LDA)专题模型分析来说明这一资料的特性,并确定潜在的讨论主题。Word频率分析确定了与ICD10具体章节有关的医学和非医学术语。LDA专题模型化分析提取了11个与健康危险因素的社会决定因素有关的专题,包括财务状况、滥用历史、社会支助、死亡风险和心理健康。此外,主题模型化方法反映了不同类型社会工作笔记和不同类型疾病或状况的病人之间的差异。我们证明社会工作笔记有丰富、独特和无法保存的个人信息。