The increased demand for online prediction and the growing availability of large data sets drives the need for computationally efficient models. While exact Gaussian process regression shows various favorable theoretical properties (uncertainty estimate, unlimited expressive power), the poor scaling with respect to the training set size prohibits its application in big data regimes in real-time. Therefore, this paper proposes dividing local Gaussian processes, which are a novel, computationally efficient modeling approach based on Gaussian process regression. Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice, while providing excellent predictive distributions. A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
翻译:在线预测的需求增加,大型数据集的可用性日益增加,这促使人们需要计算效率高的模型。精确的高斯进程回归显示各种有利的理论属性(不确定性估计,无限的表达力),但培训数据集规模的缩放不高,无法实时在大数据系统中应用。因此,本文件建议分割高斯本地流程,这是一个基于高斯进程回归的新的、计算效率高的模型化方法。由于输入空间的迭接、数据驱动的分割,它们实现了在实践培训点总数中的亚线性计算复杂性,同时提供了极好的预测分布。对真实世界数据集进行的数字评估表明,在准确性以及预测和更新速度方面,它们优于其他最先进的方法。