A model involving Gaussian processes (GPs) is introduced to simultaneously handle multi-task learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a learning step for subsequent predictions for new tasks. The model is instantiated as a mixture of multi-task GPs with common mean processes. A variational EM algorithm is derived for dealing with the optimisation of the hyper-parameters along with the hyper-posteriors' estimation of latent variables and processes. We establish explicit formulas for integrating the mean processes and the latent clustering variables within a predictive distribution, accounting for uncertainty on both aspects. This distribution is defined as a mixture of cluster-specific GP predictions, which enhances the performances when dealing with group-structured data. The model handles irregular grid of observations and offers different hypotheses on the covariance structure for sharing additional information across tasks. The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real datasets. The overall algorithm, called MagmaClust, is publicly available as an R package.
翻译:为同时处理多任务学习、集群和多功能数据的预测,引入了涉及高斯进程(GPs)的模型,以同时处理多任务学习、集群和多功能数据的预测。这一程序作为功能数据基于模型的集群方法,以及随后对新任务作出预测的学习步骤。该模型作为多任务GP与共同平均过程的混合体,被即时采用。为处理超参数的优化以及超子子体对潜在变量和进程的估计,将产生一个变式EM算法。我们为将平均过程和潜在集群变量纳入预测分布中,并计算出这两个方面的不确定性,我们制定了明确的公式。这种分布被定义为组合组合式GP预测的混合体,这将提高处理组群数据时的性能。模型处理不规则的观察网格,并提供不同参数,用于分享其他任务的额外信息。组合和预测任务的绩效是通过各种模拟假设情景和真实数据集加以评估的。总体算法,称为MagmaClust,作为R的包件公开提供。