Treeging combines the flexible mean structure of regression trees with the covariance-based prediction strategy of kriging into the base learner of an ensemble prediction algorithm. In so doing, it combines the strengths of the two primary types of spatial and space-time prediction models: (1) models with flexible mean structures (often machine learning algorithms) that assume independently distributed data, and (2) kriging or Gaussian Process (GP) prediction models with rich covariance structures but simple mean structures. We investigate the predictive accuracy of treeging across a thorough and widely varied battery of spatial and space-time simulation scenarios, comparing it to ordinary kriging, random forest and ensembles of ordinary kriging base learners. Treeging performs well across the board, whereas kriging suffers when dependence is weak or in the presence of spurious covariates, and random forest suffers when the covariates are less informative. Treeging also outperforms these competitors in predicting atmospheric pollutants (ozone and PM$_{2.5}$) in several case studies. We examine sensitivity to tuning parameters (number of base learners and training data sampling proportion), finding they follow the familiar intuition of their random forest counterparts. We include a discussion of scaleability, noting that any covariance approximation techniques that expedite kriging (GP) may be similarly applied to expedite treeging.
翻译:树苗结合了回归树的灵活平均结构与基于共变的预测战略,即将树枝刺入混合预测算法的基础学习者中,这样,它结合了两种主要空间和空间时间预测模型的优势:(1) 具有独立分布数据的灵活平均结构的模式(通常是机器学习算法),(2) 具有丰富共变结构但简单平均结构的Krigg或Gossian 进程预测模型。我们调查了在空间和时空模拟假设的彻底和广泛不同的电池中植树的预测准确性,将其与普通的基底学习者的一般克里格、随机森林和集合作比较。 植树苗在各方面表现良好,而当依赖性弱或存在刺激性共变异性时,则会受到影响。 在几个案例研究中,植树苗也比这些竞争者预测大气污染物(臭氧和PM$2.5})的预测能力要强。 我们检查了对调参数的敏感性(基础学习者的数量以及普通的基底基底和训练基底基底基学生的集合。