Evaluating the predictive performance of species distribution models (SDMs) under realistic deployment scenarios requires careful handling of spatial and temporal dependencies in the data. Cross-validation (CV) is the standard approach for model evaluation, but its design strongly influences the validity of performance estimates. When SDMs are intended for spatial or temporal transfer, random CV can lead to overoptimistic results due to spatial autocorrelation (SAC) among neighboring observations. We benchmark four machine learning algorithms (GBM, XGBoost, LightGBM, Random Forest) on two real-world presence-absence datasets, a temperate plant and an anadromous fish, using multiple CV designs: random, spatial, spatio-temporal, environmental, and forward-chaining. Two training data usage strategies (LAST FOLD and RETRAIN) are evaluated, with hyperparameter tuning performed within each CV scheme. Model performance is assessed on independent out-of-time test sets using AUC, MAE, and correlation metrics. Random CV overestimates AUC by up to 0.16 and produces MAE values up to 80 percent higher than spatially blocked alternatives. Blocking at the empirical SAC range substantially reduces this bias. Training strategy affects evaluation outcomes: LAST FOLD yields smaller validation-test discrepancies under strong SAC, while RETRAIN achieves higher test AUC when SAC is weaker. Boosted ensemble models consistently perform best under spatially structured CV designs. We recommend a robust SDM workflow based on SAC-aware blocking, blocked hyperparameter tuning, and external temporal validation to improve reliability under spatial and temporal shifts.


翻译:在真实部署场景下评估物种分布模型(SDMs)的预测性能,需要谨慎处理数据中的空间与时间依赖性。交叉验证(CV)是模型评估的标准方法,但其设计会强烈影响性能估计的有效性。当SDMs用于空间或时间迁移预测时,由于相邻观测点间的空间自相关(SAC),随机CV可能导致过于乐观的结果。本研究基于两个真实存在-缺失数据集(一种温带植物和一种溯河产卵鱼类),对四种机器学习算法(GBM、XGBoost、LightGBM、随机森林)在多种CV设计(随机、空间、时空、环境、前向链式)下进行基准测试。评估了两种训练数据使用策略(LAST FOLD与RETRAIN),并在每个CV方案内进行超参数调优。模型性能通过独立的时间外测试集,使用AUC、MAE及相关性指标进行评估。随机CV会高估AUC达0.16,其MAE值比空间分块方案高出最多80%。在经验性SAC范围内进行分块能显著降低此偏差。训练策略影响评估结果:在强SAC下,LAST FOLD策略产生较小的验证-测试差异;而当SAC较弱时,RETRAIN策略能获得更高的测试AUC。在空间结构化CV设计下,提升集成模型始终表现最佳。我们建议采用基于SAC感知分块、分块超参数调优及外部时间验证的稳健SDM工作流程,以提升模型在时空变化下的可靠性。

0
下载
关闭预览

相关内容

SAC:Selected Areas in Cryptography。 Explanation:密码术的选择区。 Publisher:Springer。 SIT:http://dblp.uni-trier.de/db/conf/sacrypt/
Top
微信扫码咨询专知VIP会员