Recently, there is great interest to investigate the application of deep learning models for the prediction of clinical events using electronic health records (EHR) data. In EHR data, a patient's history is often represented as a sequence of visits, and each visit contains multiple events. As a result, deep learning models developed for sequence modeling, like recurrent neural networks (RNNs) are common architecture for EHR-based clinical events predictive models. While a large variety of RNN models were proposed in the literature, it is unclear if complex architecture innovations will offer superior predictive performance. In order to move this field forward, a rigorous evaluation of various methods is needed. In this study, we conducted a thorough benchmark of RNN architectures in modeling EHR data. We used two prediction tasks: the risk for developing heart failure and the risk of early readmission for inpatient hospitalization. We found that simple gated RNN models, including GRUs and LSTMs, often offer competitive results when properly tuned with Bayesian Optimization, which is in line with similar to findings in the natural language processing (NLP) domain. For reproducibility, Our codebase is shared at https://github.com/ZhiGroup/pytorch_ehr.
翻译:最近,人们非常有兴趣调查利用电子健康记录(EHR)数据预测临床事件的深层次学习模型的应用情况。在EHR数据中,病人的历史往往被作为访问的顺序,每次访问都包含多种事件。结果,为序列建模开发的深层次学习模型,如经常性神经网络(RNN)是基于EHR的临床事件预测模型的共同架构。虽然文献中提出了大量种类的RNN模型,但不清楚复杂的结构创新是否会提供高超的预测性能。为了推进这一领域的工作,需要对各种方法进行严格的评估。在这项研究中,我们对RNN结构的结构进行了全面的基准,以模拟EHR数据。我们使用了两项预测任务:发展心脏衰竭的风险和早期出诊住院治疗的风险。我们发现,在与Bayesian Optimization(NLP)处理领域的调查结果相似的情况下,简单的RNNNNM模型,包括GRRRUS和LSTMS,往往提供竞争性的结果。