Electronic Health Records (EHRs) exhibit a high amount of missing data due to variations of patient conditions and treatment needs. Imputation of missing values has been considered an effective approach to deal with this challenge. Existing work separates imputation method and prediction model as two independent parts of an EHR-based machine learning system. We propose an integrated end-to-end approach by utilizing a Compound Density Network (CDNet) that allows the imputation method and prediction model to be tuned together within a single framework. CDNet consists of a Gated recurrent unit (GRU), a Mixture Density Network (MDN), and a Regularized Attention Network (RAN). The GRU is used as a latent variable model to model EHR data. The MDN is designed to sample latent variables generated by GRU. The RAN serves as a regularizer for less reliable imputed values. The architecture of CDNet enables GRU and MDN to iteratively leverage the output of each other to impute missing values, leading to a more accurate and robust prediction. We validate CDNet on the mortality prediction task on the MIMIC-III dataset. Our model outperforms state-of-the-art models by significant margins. We also empirically show that regularizing imputed values is a key factor for superior prediction performance. Analysis of prediction uncertainty shows that our model can capture both aleatoric and epistemic uncertainties, which offers model users a better understanding of the model results.
翻译:由于病人条件和治疗需求的差异,健康电子记录显示大量数据因病人条件和治疗需求的变化而缺少。计算缺失值被认为是应对这一挑战的一种有效办法。现有工作将估算方法和预测模型作为基于EHR的机器学习系统的两个独立部分,作为基于EHR的机器学习系统的两个独立部分。我们建议采用一个综合端对端方法,利用复合密度网络(CDNet),使估算方法和预测模型能够在一个单一框架内一起调整。CDNet由一个Gated经常单位(GRU)、一个MDN(MDN)和一个常规关注网络(RAN)组成。GRU被用来作为模拟EHR数据的潜在变量模型。MDN用来抽样基于EHR的机器学习系统生成的潜伏变量。我们通过CDNet的架构使GRU和MDN能够反复利用对方的输出来预测缺失值,导致更准确和更可靠的预测。我们通过MIIC-III的精确度模型来验证死亡率预测模型的CDNet。我们通过定期的预测模型来显示我们的主要预测值。我们的主要预测模型可以显示我们的主要的精确性模型。