Objective: Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes. Materials and Methods: We first defined eviction status (eviction presence and eviction period) and then annotated eviction status in 5000 EHR notes from the Veterans Health Administration (VHA). We developed a novel model, KIRESH, that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and BioClinicalBERT. Moreover, we designed a novel prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt substantially outperformed strong baseline models including fine-tuning the BioClinicalBERT model to achieve 0.74672 MCC, 0.71153 Macro-F1, and 0.83396 Micro-F1 in predicting eviction period and 0.66827 MCC, 0.62734 Macro-F1, and 0.7863 Micro-F1 in predicting eviction presence. We also conducted additional experiments on a benchmark social determinants of health (SBDH) dataset to demonstrate the generalizability of our methods. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. We plan to deploy KIRESH-Prompt to the VHA EHRs as an eviction surveillance system to help address the US Veterans' housing insecurity.
翻译:摘要: 驱逐是重要的社会和行为健康决定因素。驱逐与一系列负面事件相关,这些事件可能导致失业、住房不安全/无家可归、长期贫困和心理健康问题。在本研究中,我们开发了一种自然语言处理系统,可以从电子病历(EHR)笔记中自动检测驱逐状态。 材料和方法:我们首先定义了驱逐状态(驱逐存在和驱逐期),然后在退伍军人保健局(VHA)的5000份EHR笔记中注释了驱逐状态。我们开发了一种新颖的模型KIRESH,已经显示出在细化预训练语言模型如BioBERT和BioClinicBERT等方面远远优于其他最先进的模型。此外,我们设计了一个新颖的提示,通过使用驱逐存在和期限预测这两个子任务之间的内在联系,进一步提高了模型性能。最后,我们使用基于温度缩放的校准方法在我们的KIRESH-Prompt方法上,以避免由于不平衡数据集引起的过度自信问题。 结果: KIRESH-Prompt显著优于强有力的基线模型,包括对BioClinicalBERT模型进行微调,以实现0.74672 MCC,0.71153宏F1和0.83396微F1,以预测驱逐期,并且在预测驱逐存在时实现了0.66827 MCC,0.62734宏 F1和0.7863微F1。我们还在基准社会因素健康(SBDH)数据集上进行了其他实验,以证明我们方法的普适性。 结论和未来工作: KIRESH-Prompt大大改善了驱逐状态分类。我们计划将KIRESH-Prompt部署到VHA EHR中作为一种驱逐监控系统,帮助解决美国退伍军人的住房不安全问题。