The variational autoencoder (VAE) is a popular deep latent variable model used to analyse high-dimensional datasets by learning a low-dimensional latent representation of the data. It simultaneously learns a generative model and an inference network to perform approximate posterior inference. Recently proposed extensions to VAEs that can handle temporal and longitudinal data have applications in healthcare, behavioural modelling, and predictive maintenance. However, these extensions do not account for heterogeneous data (i.e., data comprising of continuous and discrete attributes), which is common in many real-life applications. In this work, we propose the heterogeneous longitudinal VAE (HL-VAE) that extends the existing temporal and longitudinal VAEs to heterogeneous data. HL-VAE provides efficient inference for high-dimensional datasets and includes likelihood models for continuous, count, categorical, and ordinal data while accounting for missing observations. We demonstrate our model's efficacy through simulated as well as clinical datasets, and show that our proposed model achieves competitive performance in missing value imputation and predictive accuracy.
翻译:变式自动编码器(VAE)是一种广受欢迎的深潜伏变量模型,用于通过学习数据低维潜值表示来分析高维数据集。它同时学习一种基因模型和推论网络,以进行近似次推推力。最近提议对可处理时间和纵向数据的VAE的扩展适用于保健、行为模型和预测维护。然而,这些扩展没有考虑到多种不同数据(即由连续和离散特性组成的数据),这在许多现实应用中是常见的。在这项工作中,我们提出了将现有时间和纵向VAE扩大至多元数据的多元纵向VAE(HL-VAE)。HL-VAE为高维数据集提供了高效的推论,包括连续、计数、直线和星系数据的可能性模型,同时计算缺失的观测结果。我们通过模拟和临床数据集来展示我们的模型的功效,并表明我们提议的模型在缺失值的精度和预测精确度方面实现了竞争性的性能。