Random Forest (RF) is a well-known data-driven algorithm applied in several fields thanks to its flexibility in modeling the relationship between the response variable and the predictors, also in case of strong non-linearities. In environmental applications, it often occurs that the phenomenon of interest may present spatial and/or temporal dependence that is not taken explicitly into account by RF in its standard version. In this work, we propose a taxonomy to classify strategies according to when (Pre-, In- and/or Post-processing) they try to include the spatial information into regression RF. Moreover, we provide a systematic review and classify the most recent strategies adopted to "adjust" regression RF to spatially dependent data, based on the criteria provided by the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA). The latter consists of a reproducible methodology for collecting and processing existing literature on a specified topic from different sources. PRISMA starts with a query and ends with a set of scientific documents to review: we performed an online query on the 25$^{th}$ October 2022 and, in the end, 32 documents were considered for review. The employed methodological strategies and the application fields considered in the 32 scientific documents are described and discussed.
翻译:随机森林(RF)是一个众所周知的数据驱动算法,它适用于几个领域,因为它在模拟反应变量和预测器之间的关系方面具有灵活性,在非线性强的情况下也是如此。在环境应用中,人们经常注意到,兴趣现象可能带来空间和/或时间依赖,而RF在其标准版本中并未明确考虑到这一点。在这项工作中,我们建议一种分类法,根据战略(预先、内和/或后处理)试图将空间信息纳入回归RF时,对战略进行分类。此外,我们根据系统审查和元分析的首选报告项目提供的标准,对最近通过的“调整”回归RF至空间依赖数据的战略进行了系统审查和分类。后者包括从不同来源收集和处理关于特定主题的现有文献的可复制方法。PRISMA首先从查询开始,最后是一套科学文件:2022年10月我们进行了25美元在线查询,最后讨论了32份文件,并讨论了用于审查。</s>