关于电子健康记录的深入稳定代表性学习 (Deep Stable Representation Learning on Electronic Health Records)

Deep learning models have achieved promising disease prediction performance of the Electronic Health Records (EHR) of patients. However, most models developed under the I.I.D. hypothesis fail to consider the agnostic distribution shifts, diminishing the generalization ability of deep learning models to Out-Of-Distribution (OOD) data. In this setting, spurious statistical correlations that may change in different environments will be exploited, which can cause sub-optimal performances of deep learning models. The unstable correlation between procedures and diagnoses existed in the training distribution can cause spurious correlation between historical EHR and future diagnosis. To address this problem, we propose to use a causal representation learning method called Causal Healthcare Embedding (CHE). CHE aims at eliminating the spurious statistical relationship by removing the dependencies between diagnoses and procedures. We introduce the Hilbert-Schmidt Independence Criterion (HSIC) to measure the degree of independence between the embedded diagnosis and procedure features. Based on causal view analyses, we perform the sample weighting technique to get rid of such spurious relationship for the stable learning of EHR across different environments. Moreover, our proposed CHE method can be used as a flexible plug-and-play module that can enhance existing deep learning models on EHR. Extensive experiments on two public datasets and five state-of-the-art baselines unequivocally show that CHE can improve the prediction accuracy of deep learning models on out-of-distribution data by a large margin. In addition, the interpretability study shows that CHE could successfully leverage causal structures to reflect a more reasonable contribution of historical records for predictions.

翻译：深层次的学习模型已经取得了对病人电子健康记录(EHR)的有希望的疾病预测绩效。然而,在I.I.D.假设下开发的大多数模型都未能考虑到不可知分布的变化,从而降低了深层学习模型对外部分配(OOOD)数据的普遍化能力。在这一背景下,将利用在不同环境中可能变化的虚假统计相关性,从而导致深层学习模型的次优性表现。在培训分布中存在程序与诊断之间的不稳关联,这可能导致历史健康记录与未来诊断之间的虚假关联。为解决这一问题,我们提议使用一种称为“Causal Heal Healcare Embideting (CHE)”的因果代表学习方法。 CHE的目的是通过消除诊断和程序之间的依赖性关系来消除虚假的统计关系。我们引入了Hilbert-Schmidt 独立性标准(HSIC),以衡量嵌入式诊断与程序特征之间的独立程度。根据因果关系分析,我们进行了抽样加权技术,以摆脱这种具有误导性的关系,以便在不同环境中稳定地学习EHR的准确度。此外的精确度。此外,我们所提出的方法可以用来改进现有数据模型,可以用来显示一种状态的精确性模型。