In geostatistical problems with massive sample size, Gaussian processes (GP) can be approximated using sparse directed acyclic graphs to achieve scalable $O(n)$ computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and may result in sensitivity to graph choice. We address these issues by introducing radial neighbors Gaussian processes and corresponding theoretical guarantees. We propose to approximate GPs using a sparse directed acyclic graph in which a directed edge connects every location to all of its neighbors within a predetermined radius. Using our novel construction, we show that one can accurately approximate a Gaussian process in Wasserstein-2 distance, with an error rate determined by the approximation radius, the spatial covariance function, and the spatial dispersion of samples. Our method is also insensitive to specific graphical model choice. We offer further empirical validation of our approach via applications on simulated and real world data showing state-of-the-art performance in posterior inference of spatial random effects.
翻译:在有大量样本的地质统计问题中,Gaussian进程(GP)可以使用稀有的定向环绕图进行近似,以达到可缩放的美元(n)的计算复杂性。在这些模型中,每个地点的数据通常都有条件地假设取决于一小批通常包括近邻子群的家长。这些方法往往表现出出色的实证性能,但缺乏理论验证导致在指定基本图形模型方面指导不明确,并可能导致对图形选择的敏感度。我们通过引入辐射邻居Gaussian进程和相应的理论保证来解决这些问题。我们提议使用稀有的定向环绕图来接近GPs,其中定向边缘将每个地点与预定半径内的所有邻国连接起来。我们使用我们的新构造,我们表明可以精确地接近瓦塞斯坦-2距离的戈西亚进程,其误差率由近地点半径、空间变异性功能和样本的空间分布确定。我们的方法对具体的图形模型选择也不敏感。我们提议通过模拟和实际的地球空间变化状态数据应用来进一步验证我们的方法。