Binary geospatial data is commonly analyzed with generalized linear mixed models, specified with a linear fixed covariate effect and a Gaussian Process (GP)-distributed spatial random effect, relating to the response via a link function. The assumption of linear covariate effects is severely restrictive. Random Forests (RF) are increasingly being used for non-linear modeling of spatial data, but current extensions of RF for binary spatial data depart the mixed model setup, relinquishing inference on the fixed effects and other advantages of using GP. We propose RF-GP, using Random Forests for estimating the non-linear covariate effect and Gaussian Processes for modeling the spatial random effects directly within the generalized mixed model framework. We observe and exploit equivalence of Gini impurity measure and least squares loss to propose an extension of RF for binary data that accounts for the spatial dependence. We then propose a novel link inversion algorithm that leverages the properties of GP to estimate the covariate effects and offer spatial predictions. RF-GP outperforms existing RF methods for estimation and prediction in both simulated and real-world data. We establish consistency of RF-GP for a general class of $\beta$-mixing binary processes that includes common choices like spatial Mat\'ern GP and autoregressive processes.
翻译:二进制地理空间数据通常以一般线性混合模型进行分析,这些模型的规格是线性固定共变效应和高斯进程分布式空间随机效应,涉及通过链接功能做出的反应。线性共变效应的假设具有严格的限制性。随机森林(RF)正越来越多地用于空间数据的非线性建模,但目前用于二进制空间数据的RF扩展会脱离混合模型设置,放弃对使用GP的固定效应和其他优势的推断。我们提议RF-GP,使用随机森林来估计非线性共变效应和高斯进程,直接在通用混合模型框架范围内模拟空间随机效应的模型。我们观察和利用Gini杂质测量和最小方损失的等值来提议扩展RF用于计算计算双进数据以说明空间依赖性的双进制数据。我们然后提议一种新的转换算法,利用GP的特性来估计共变相效应并提供空间预测。RF-GP超出现有的非线性共变方选择法方法,在通用混合模型模型和最小平方位数据中,我们为GMRF的普通的模拟和硬性数据。</s>