High resolution geospatial data are challenging because standard geostatistical models based on Gaussian processes are known to not scale to large data sizes. While progress has been made towards methods that can be computed more efficiently, considerably less attention has been devoted to big data methods that allow the description of complex relationships between several outcomes recorded at high resolutions by different sensors. Our Bayesian multivariate regression models based on spatial multivariate trees (SpamTrees) achieve scalability via conditional independence assumptions on latent random effects following a treed directed acyclic graph. Information-theoretic arguments and considerations on computational efficiency guide the construction of the tree and the related efficient sampling algorithms in imbalanced multivariate settings. In addition to simulated data examples, we illustrate SpamTrees using a large climate data set which combines satellite data with land-based station data. Source code is available at https://github.com/mkln/spamtree
翻译:高度分辨率地理空间数据具有挑战性,因为据了解,基于高斯进程的标准地理统计模型没有规模到大数据大小。虽然在可以更高效地计算的方法方面取得了进展,但对大数据方法的注意却少得多,因为大数据方法能够描述不同传感器高分辨率记录的若干结果之间的复杂关系。我们基于空间多变量树(SpamTrees)的贝亚多变量回归模型通过对树形定向环流图的潜在随机效应的有条件独立假设而实现可缩放性。关于计算效率的信息理论和考虑指导了在不平衡的多变量环境中构建树和相关的高效取样算法。除了模拟数据实例外,我们还用大型气候数据集说明垃圾邮件,该数据集将卫星数据与陆基站数据相结合。源代码可在https://github.com/mkln/spamtree查阅 https://github.com/mkln/spamtree查阅。