In electronic health records (EHRs), latent subgroups of patients may exhibit distinctive patterning in their longitudinal health trajectories. For such data, growth mixture models (GMMs) enable classifying patients into different latent classes based on individual trajectories and hypothesized risk factors. However, the application of GMMs is hindered by the special missing data problem in EHRs, which manifests two patient-led missing data processes: the visit process and the response process for an EHR variable conditional on a patient visiting the clinic. If either process is associated with the process generating the longitudinal outcomes, then valid inferences require accounting for a nonignorable missing data mechanism. We propose a Bayesian shared parameter model that links GMMs of multiple longitudinal health outcomes, the visit process, and the response process of each outcome given a visit using a discrete latent class variable. Our focus is on multiple longitudinal health outcomes for which there can be a clinically prescribed visit schedule. We demonstrate our model in EHR measurements on early childhood weight and height z-scores. Using data simulations, we illustrate the statistical properties of our method with respect to subgroup-specific or marginal inferences. We built the R package EHRMiss for model fitting, selection, and checking.
翻译:在电子健康记录(EHRs)中,潜在病人分组可能在其纵向健康轨迹中表现出独特的模式。对于这些数据,生长混合模型(GMMs)能够根据个别轨迹和假设风险因素将病人分为不同的潜型;然而,在电子健康记录中,由于缺少特殊的数据问题,GMMs的应用受到阻碍,这表现了两个由病人领导的缺失数据过程:以病人到诊所就诊为条件的EHR变量的访问过程和反应过程。如果这两种过程都与产生纵向结果的过程有关,那么有效的推断要求核算一个不可忽略的缺失数据机制。我们建议一种巴伊西亚共享参数模型,将多重纵向健康结果的GMMs、访问过程和每个结果的响应过程联系起来,使用离散潜在等级变量进行访问。我们的重点是多个纵向健康结果,可以有临床规定的访问时间表。我们在EHR关于幼儿体重和高度的测量模型中展示了我们的模型。我们用数据模拟来说明我们方法的统计特性,我们用边际选择模型,我们用边际选择的方法来检查。