Evaluating models fit to data with internal spatial structure requires specific cross-validation (CV) approaches, because randomly selecting assessment data may produce assessment sets that are not truly independent of data used to train the model. Many spatial CV methodologies have been proposed to address this by forcing models to extrapolate spatially when predicting the assessment set. However, to date there exists little guidance on which methods yield the most accurate estimates of model performance. We conducted simulations to compare model performance estimates produced by five common CV methods fit to spatially structured data. We found spatial CV approaches generally improved upon resubstitution and V-fold CV estimates, particularly when approaches which combined assessment sets of spatially conjunct observations with spatial exclusion buffers. To facilitate use of these techniques, we introduce the `spatialsample` package which provides tooling for performing spatial CV as part of the broader tidymodels modeling framework.
翻译:适合内部空间结构数据的评估模型需要具体的交叉验证(CV)方法,因为随机选择评估数据可能会产生并非真正独立于模型培训数据的评估数据集,许多空间CV方法已经提出解决这个问题,办法是迫使模型在预测评估数据集时进行空间外推,但是,迄今为止,对于哪些方法得出最准确的模型性能估计,没有多少指导。我们进行了模拟,以比较适合空间结构数据的五种通用CV方法产生的模型性能估计。我们发现,空间CV方法在重新替代和V倍CV估计数的基础上普遍得到改进,特别是当将空间相邻观测的成套评估与空间排除缓冲相结合的方法。为了便利这些技术的使用,我们采用了“空间模型”包,该包为进行空间CV提供工具,作为更广泛的整理模型框架的一部分。</s>