Distributed Gaussian process (DGP) is a popular approach to scale GP to big data which divides the training data into some subsets, performs local inference for each partition, and aggregates the results to acquire global prediction. To combine the local predictions, the conditional independence assumption is used which basically means there is a perfect diversity between the subsets. Although it keeps the aggregation tractable, it is often violated in practice and generally yields poor results. In this paper, we propose a novel approach for aggregating the Gaussian experts' predictions by Gaussian graphical model (GGM) where the target aggregation is defined as an unobserved latent variable and the local predictions are the observed variables. We first estimate the joint distribution of latent and observed variables using the Expectation-Maximization (EM) algorithm. The interaction between experts can be encoded by the precision matrix of the joint distribution and the aggregated predictions are obtained based on the property of conditional Gaussian distribution. Using both synthetic and real datasets, our experimental evaluations illustrate that our new method outperforms other state-of-the-art DGP approaches.
翻译:Gaussian 进程( DGP) 是一种将 GP 比例化为大数据的普遍方法,该方法将培训数据分为某些子集,对每个分区进行局部推论,并汇总结果以获得全球预测。 结合当地预测, 使用有条件的独立假设, 基本上意味着子集之间有完全的多样化。 虽然它保持了聚合可移动性, 但在实践中经常被违反, 并通常产生不良结果。 在本文中, 我们提出一种新颖的方法, 将Gaussian 专家的预测汇总到高西亚 图形模型( GGM) 中, 该模型将目标集合定义为未观测到的潜在变量, 而本地预测则是观察到的变量。 我们首先使用期望- 最大化( EM) 算法来估计潜在和观察到的变量的联合分布。 专家之间的互动可以由联合分布的精确矩阵编码, 并且根据有条件高西亚分布的属性获得汇总预测。 我们的实验性评估用合成和真实数据集表明, 我们的新方法比其他状态的 DGPGP 方法要差。