Clinical patient records are an example of high-dimensional data that is typically collected from disparate sources and comprises of multiple likelihoods with noisy as well as missing values. In this work, we propose an unsupervised generative model that can learn a low-dimensional representation among the observations in a latent space, while making use of all available data in a heterogeneous data setting with missing values. We improve upon the existing Gaussian process latent variable model (GPLVM) by incorporating multiple likelihoods and deep neural network parameterised back-constraints to create a non-linear dimensionality reduction technique for heterogeneous data. In addition, we develop a variational inference method for our model that uses numerical quadrature. We establish the effectiveness of our model and compare against existing GPLVM methods on a standard benchmark dataset as well as on clinical data of Parkinson's disease patients treated at the HUS Helsinki University Hospital.
翻译:临床病人记录是高维数据的一个典型例子,这些数据通常从不同来源收集,由杂乱和缺失值的多种可能性组成。在这项工作中,我们提议了一个不受监督的基因模型,可以在潜空的观测中学习低维代表,同时利用所有可用数据,在一个有缺失值的多样化数据集中使用所有缺失值的数据。我们改进了现有的高斯进程潜伏变量模型(GPLVM),将多种可能性和深神经网络参数化的后神经网,为多种数据创建一种非线性维度减少技术。此外,我们为使用数字二次模型的模型开发了一种变式推论方法。我们建立了模型的有效性,并在标准基准数据集和在HUS Herkin大学医院治疗的Parkinson病人临床数据上与现有的GPLVM方法进行比较。