自动化从电子病历笔记中识别驱逐状态 (Automated Identification of Eviction Status from Electronic Health Record Notes)

Objective: Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes. Materials and Methods: We first defined eviction status (eviction presence and eviction period) and then annotated eviction status in 5000 EHR notes from the Veterans Health Administration (VHA). We developed a novel model, KIRESH, that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and BioClinicalBERT. Moreover, we designed a novel prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt substantially outperformed strong baseline models including fine-tuning the BioClinicalBERT model to achieve 0.74672 MCC, 0.71153 Macro-F1, and 0.83396 Micro-F1 in predicting eviction period and 0.66827 MCC, 0.62734 Macro-F1, and 0.7863 Micro-F1 in predicting eviction presence. We also conducted additional experiments on a benchmark social determinants of health (SBDH) dataset to demonstrate the generalizability of our methods. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. We plan to deploy KIRESH-Prompt to the VHA EHRs as an eviction surveillance system to help address the US Veterans' housing insecurity.

翻译：摘要：驱逐是重要的社会和行为健康决定因素。驱逐与一系列负面事件相关，这些事件可能导致失业、住房不安全/无家可归、长期贫困和心理健康问题。在本研究中，我们开发了一种自然语言处理系统，可以从电子病历（EHR）笔记中自动检测驱逐状态。材料和方法：我们首先定义了驱逐状态（驱逐存在和驱逐期），然后在退伍军人保健局（VHA）的5000份EHR笔记中注释了驱逐状态。我们开发了一种新颖的模型KIRESH，已经显示出在细化预训练语言模型如BioBERT和BioClinicBERT等方面远远优于其他最先进的模型。此外，我们设计了一个新颖的提示，通过使用驱逐存在和期限预测这两个子任务之间的内在联系，进一步提高了模型性能。最后，我们使用基于温度缩放的校准方法在我们的KIRESH-Prompt方法上，以避免由于不平衡数据集引起的过度自信问题。结果： KIRESH-Prompt显著优于强有力的基线模型，包括对BioClinicalBERT模型进行微调，以实现0.74672 MCC，0.71153宏F1和0.83396微F1，以预测驱逐期，并且在预测驱逐存在时实现了0.66827 MCC，0.62734宏 F1和0.7863微F1。我们还在基准社会因素健康（SBDH）数据集上进行了其他实验，以证明我们方法的普适性。结论和未来工作： KIRESH-Prompt大大改善了驱逐状态分类。我们计划将KIRESH-Prompt部署到VHA EHR中作为一种驱逐监控系统，帮助解决美国退伍军人的住房不安全问题。