Disease progression modeling (DPM) involves using mathematical frameworks to quantitatively measure the severity of how certain disease progresses. DPM is useful in many ways such as predicting health state, categorizing disease stages, and assessing patients disease trajectory etc. Recently, with wider availability of electronic health records (EHR) and the broad application of data-driven machine learning method, DPM has attracted much attention yet remains two major challenges: (i) Due to the existence of irregularity, heterogeneity and long-term dependency in EHRs, most existing DPM methods might not be able to provide comprehensive patient representations. (ii) Lots of records in EHRs might be irrelevant to the target disease. Most existing models learn to automatically focus on the relevant information instead of explicitly capture the target-relevant events, which might make the learned model suboptimal. To address these two issues, we propose Temporal Clustering with External Memory Network (TC-EMNet) for DPM that groups patients with similar trajectories to form disease clusters/stages. TC-EMNet uses a variational autoencoder (VAE) to capture internal complexity from the input data and utilizes an external memory work to capture long term distance information, both of which are helpful for producing comprehensive patient states. Last but not least, k-means algorithm is adopted to cluster the extracted comprehensive patient states to capture disease progression. Experiments on two real-world datasets show that our model demonstrates competitive clustering performance against state-of-the-art methods and is able to identify clinically meaningful clusters. The visualization of the extracted patient states shows that the proposed model can generate better patient states than the baselines.
翻译:疾病发展模型(DPM)涉及使用数学框架来定量测量某些疾病进展的严重程度。 疾病管理模型在许多方面非常有用,例如预测健康状况、对疾病阶段进行分类和评估病人疾病轨迹等。 最近,随着电子健康记录(EHR)的更广泛提供以及数据驱动机器学习方法的广泛应用,疾病管理模型吸引了大量关注,但仍然是两大挑战:(一) 由于电子HR中存在不正常、异质性和长期依赖性,大多数现有的病人治疗方法可能无法提供全面的病人表征。 (二) 电子人力资源中的许多记录可能与目标疾病无关。 大多数现有模型学会自动侧重于相关信息,而不是明确捕捉与目标相关的事件,这可能会使所学的模型变得亚性。为了解决这两个问题,我们建议与外部记忆网络(TC-EMNet)一起将具有类似轨迹的病人分组组成疾病群集/阶段。 TC-EMNet使用变式自动解剖模型(VEE)可能显示与目标疾病无关。 多数现有模型(VE) 现有模型都学会自动关注相关信息,而不是直接记录与目标相关事件相关事件, 直径直径直径模型,这可以显示从输入数据到最新数据到最新数据到最新数据采集。