Gaussian processes (GPs) are non-linear probabilistic models popular in many applications. However, na\"ive GP realizations require quadratic memory to store the covariance matrix and cubic computation to perform inference or evaluate the likelihood function. These bottlenecks have driven much investment in the development of approximate GP alternatives that scale to the large data sizes common in modern data-driven applications. We present in this manuscript MuyGPs, a novel efficient GP hyperparameter estimation method. MuyGPs builds upon prior methods that take advantage of the nearest neighbors structure of the data, and uses leave-one-out cross-validation to optimize covariance (kernel) hyperparameters without realizing a possibly expensive likelihood. We describe our model and methods in detail, and compare our implementations against the state-of-the-art competitors in a benchmark spatial statistics problem. We show that our method outperforms all known competitors both in terms of time-to-solution and the root mean squared error of the predictions.
翻译:Gausian 进程( GPs) 是许多应用中流行的非线性概率模型。 然而, na\\ “ ive GP 实现需要二次内存以存储共变矩阵和立方计算以进行推论或评估概率函数。 这些瓶颈驱动了对开发近似GP替代品的大量投资,其规模相当于现代数据驱动应用中常见的大数据大小。 我们在手稿《 MuyGPs》中展示了一种新型的高效GP超参数估计方法。 MuyGPs 以先前利用数据近邻结构的方法为基础, 并使用离线交叉校验法优化共变数( 内核) 双参数, 而没有实现可能昂贵的可能性。 我们详细描述了我们的模型和方法,并在基准空间统计问题中将我们的执行情况与最先进的竞争者进行比较。 我们显示,我们的方法在溶解时间和预测的根平方错误上都超越了所有已知的竞争者。