Gaussian Process regression is a kernel method successfully adopted in many real-life applications. Recently, there is a growing interest on extending this method to non-Euclidean input spaces, like the one considered in this paper, consisting of probability measures. Although a Positive Definite kernel can be defined by using a suitable distance -- the Wasserstein distance -- the common procedure for learning the Gaussian Process model can fail due to numerical issues, arising earlier and more frequently than in the case of an Euclidean input space and, as demonstrated in this paper, that cannot be avoided by adding artificial noise (nugget effect) as usually done. This paper uncovers the main reason of these issues, that is a non-stationarity relationship between the Wasserstein-based squared exponential kernel and its Euclidean-based counterpart. As a relevant result, the Gaussian Process model is learned by assuming the input space as Euclidean and then an algebraic transformation, based on the uncovered relation, is used to transform it into a non-stationary and Wasserstein-based Gaussian Process model over probability measures. This algebraic transformation is simpler than log-exp maps used in the case of data belonging to Riemannian manifolds and recently extended to consider the pseudo-Riemannian structure of an input space equipped with the Wasserstein distance.
翻译:Gaussian 进程回归是在许多实际应用中成功采用的一种内核方法。最近,人们越来越关注将这种方法扩大到非欧洲的输入空间,如本文件所考虑的由概率测量组成的方法。虽然使用适当的距离可以确定正二硝基内核,即瓦塞斯坦距离 -- -- 学习Gaussian进程模型的共同程序可能因数字问题而失败,与Euclidean输入空间相比,这个方法更早、更频繁地出现,而且正如本文件所显示的那样,如果像通常所做的那样添加人工噪音(纳吉效应),就无法避免。本文揭示了这些问题的主要原因,这就是基于瓦塞斯坦的正方形指数内核内核与其以Euclidean为基础的对应方核内核。作为一个相关结果,Gaussian进程模型通过假设输入空间空间为Euclideian,然后根据本文件所揭示的关系,通过增加人工噪音(纳吉特效应效应)噪音(纳吉特效应效应效应),无法避免。本文揭示了这些问题的主要原因,这就是在瓦塞斯坦-斯特斯坦的远程图像结构中,最近使用的、基于更精确的图像模型的G数级模型是用来将Gal-ribalbribribal 的模型的模型与最近的图像模型的模型比用于高级图像分析。