Objective: To improve survival analysis using EHR data, we aim to develop a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Materials and Methods: Our technical contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) inferring patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival and guided topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-G using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8,211 subjects with 75,187 outpatient claim data of 1,767 unique ICD codes; the MIMIC-III consisting of 1,458 subjects with multi-modal EHR records. Results: Compared to the baselines, MixEHR-G achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-G associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Conclusion: The integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG led to not only competitive mortality prediction but also meaningful phenotype topics for systematic survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.
翻译:暂无翻译