In nonparametric regression and spatial process modeling, it is common for the inputs to fall in a restricted subset of Euclidean space. For example, the locations at which spatial data are collected may be restricted to a narrow non-linear subset, such as near the edge of a lake. Typical kernel-based methods that do not take into account the intrinsic geometric of the domain across which observations are collected may produce sub-optimal results. In this article, we focus on solving this problem in the context of Gaussian process (GP) models, proposing a new class of diffusion-based GPs (DB-GPs), which learn a covariance that respects the geometry of the input domain. We use the term `diffusion-based' as the idea is to measure intrinsic distances between inputs in a restricted domain via a diffusion process. As the heat kernel is intractable computationally, we approximate the covariance using finitely-many eigenpairs of the Graph Laplacian (GL). Our proposed algorithm has the same order of computational complexity as current GP algorithms using simple covariance kernels. We provide substantial theoretical support for the DB-GP methodology, and illustrate performance gains through toy examples, simulation studies, and applications to ecology data.
翻译:在非对称回归和空间过程模型中,投入通常会掉落到一个有限的ELClidean空间子集中,例如,收集空间数据的地点可能限于狭小的非线性子集,如湖边缘。典型的内核方法不考虑观测所收集的跨领域固有的几何,可能会产生亚最佳结果。在本篇文章中,我们侧重于在高斯进程(GP)模型中解决这一问题,提出一个新的基于扩散的GP(DB-GPs)类别,该类别可以学习尊重输入域几何学的变量。我们使用“基于扩散的”一词,因为其想法是通过扩散过程测量在有限范围内输入之间的内在距离。由于热内核是难以计算的结果,我们用拉普拉普提亚图(GL)的有限性易本性(igenpairs)模型来估计差异。我们提议的算法具有与当前GPGA的计算复杂性相同的顺序,我们使用简单的计算方法,我们用简单的演算法来说明数据,我们用简单的演算法向数据分析结果提供实质性支持。