Longitudinal datasets measured repeatedly over time from individual subjects, arise in many biomedical, psychological, social, and other studies. A common approach to analyse high-dimensional data that contains missing values is to learn a low-dimensional representation using variational autoencoders (VAEs). However, standard VAEs assume that the learnt representations are i.i.d., and fail to capture the correlations between the data samples. We propose the Longitudinal VAE (L-VAE), that uses a multi-output additive Gaussian process (GP) prior to extend the VAE's capability to learn structured low-dimensional representations imposed by auxiliary covariate information, and derive a new KL divergence upper bound for such GPs. Our approach can simultaneously accommodate both time-varying shared and random effects, produce structured low-dimensional representations, disentangle effects of individual covariates or their interactions, and achieve highly accurate predictive performance. We compare our model against previous methods on synthetic as well as clinical datasets, and demonstrate the state-of-the-art performance in data imputation, reconstruction, and long-term prediction tasks.
翻译:在许多生物医学、心理、社会和其他研究中,反复从个别主题中测得纵向数据集,这些数据集出现在许多生物医学、心理、社会和其他研究中。分析包含缺失值的高维数据的共同方法,是利用变式自动电解码器(VAE)学习低维表示法;然而,标准的VAE假设,所学的表示法是i.d.,未能捕捉数据样品之间的相互关系。我们建议采用纵向VAE(L-VAE),在扩大VAE学习辅助共变式信息所强加的结构性低维表示法的能力之前,先采用多输出添加法(GP)程序(GP),以学习辅助共变式信息所强加的低维表示法,并得出新的KL差异值值值值值值值。我们的方法可以同时兼顾时间变式共和随机效应,产生结构化的低维度表达法或其相互作用的分解效应,并实现非常准确的预测性能。我们比较了我们的模型与以前关于合成和临床数据集的方法,并展示数据浸明数据内的最新性表现。