Quantifying spatial and/or temporal associations in multivariate geolocated data of different types is achievable via spatial random effects in a Bayesian hierarchical model, but severe computational bottlenecks arise when spatial dependence is encoded as a latent Gaussian process (GP) in the increasingly common large scale data settings on which we focus. The scenario worsens in non-Gaussian models because the reduced analytical tractability leads to additional hurdles to computational efficiency. In this article, we introduce Bayesian models of spatially referenced data in which the likelihood or the latent process (or both) are not Gaussian. First, we exploit the advantages of spatial processes built via directed acyclic graphs, in which case the spatial nodes enter the Bayesian hierarchy and lead to posterior sampling via routine Markov chain Monte Carlo (MCMC) methods. Second, motivated by the possible inefficiencies of popular gradient-based sampling approaches in the multivariate contexts on which we focus, we introduce the simplified manifold preconditioner adaptation (SiMPA) algorithm which uses second order information about the target but avoids expensive matrix operations. We demostrate the performance and efficiency improvements of our methods relative to alternatives in extensive synthetic and real world remote sensing and community ecology applications with large scale data at up to hundreds of thousands of spatial locations and up to tens of outcomes. Software for the proposed methods is part of R package 'meshed', available on CRAN.
翻译:不同类型多变地理分布数据的空间和/或时间联系量化不同类型多变地理定位数据的空间和/或时间关联,可以通过贝耶斯人等级模型的空间随机效应实现。但是,当空间依赖被编码为我们关注的日益常见的大规模数据设置中的一个潜伏高斯进程时,出现了严重的计算瓶颈。由于分析可移植性下降导致计算效率的额外障碍,非加西人模型中的情况恶化。在本篇文章中,我们引入了巴耶斯人空间引用数据模型,其中的可能性或潜在进程(或两者)并非高斯人。首先,我们利用通过定向环流图构建的空间依赖过程的优势,在这种情况下,空间节点进入巴耶斯人等级,并导致通过常规的马尔科夫链蒙特卡洛(MC)方法进行后方取样。第二,由于我们所拟议的多变环境中流行的基于梯度的取样方法可能效率低下,我们引入了简化的多重先决条件(SIMA)算法,该算法使用关于目标的第二顺序信息,但避免成本的矩阵操作。我们利用了空间节制空间节图的空间过程的优势过程的优势模型应用方法,从数千次到全球数据模型的模型应用的模型模型应用方法,我们现有数据效率,在高成千千次的合成的模型中,在高的模型应用中将可改进了。