The rapid growth of electronic health record (EHR) datasets opens up promising opportunities to understand human diseases in a systematic way. However, effective extraction of clinical knowledge from the EHR data has been hindered by its sparsity and noisy information. We present KG-ETM, an end-to-end knowledge graph-based multimodal embedded topic model. KG-ETM distills latent disease topics from EHR data by learning the embedding from the medical knowledge graphs. We applied KG-ETM to a large-scale EHR dataset consisting of over 1 million patients. We evaluated its performance based on EHR reconstruction and drug imputation. KG-ETM demonstrated superior performance over the alternative methods on both tasks. Moreover, our model learned clinically meaningful graph-informed embedding of the EHR codes. In additional, our model is also able to discover interpretable and accurate patient representations for patient stratification and drug recommendations.
翻译:电子健康记录(EHR)数据集的迅速增长为系统地了解人类疾病开辟了大好机会,然而,从EHR数据中有效提取临床知识的工作受到其偏狭和吵闹信息的阻碍。我们介绍了以端对端知识图为基础的多式嵌入主题模型KG-ETM。KG-ETM通过学习医学知识图的嵌入,从EHR数据中提炼潜在疾病主题。我们将KG-ETM应用到一个由100多万病人组成的大规模EHR数据集中。我们根据EHR重建和药物估算评估了该数据集的性能。KG-ETM展示了优于两种任务替代方法的性能。此外,我们的模型还学习了在临床上有意义的图形嵌入EHR代码。此外,我们的模型还能够发现可解释和准确的病人表现,用于病人分辨和药物建议。