In areal unit data with missing or suppressed data, it desirable to create models that are able to predict observations that are not available. Traditional statistical methods achieve this through Bayesian hierarchical models that can capture the unexplained residual spatial autocorrelation through conditional autoregressive (CAR) priors, such that they can make predictions at geographically related spatial locations. In contrast, typical machine learning approaches such as random forests ignore this residual autocorrelation, and instead base predictions on complex non-linear feature-target relationships. In this paper, we propose CAR-Forest, a novel spatial prediction algorithm that combines the best features of both approaches by fusing them together. By iteratively refitting a random forest combined with a Bayesian CAR model in one algorithm, CAR-Forest can incorporate flexible feature-target relationships while still accounting for the residual spatial autocorrelation. Our results, based on a Scottish housing price data set, show that CAR-Forest outperforms Bayesian CAR models, random forests, and the state-of-the-art hybrid approach, geographically weighted random forest, providing a state-of-the-art framework for small-area spatial prediction.
翻译:暂无翻译