This paper presents a new approach for trees-based regression, such as simple regression tree, random forest and gradient boosting, in settings involving correlated data. We show the problems that arise when implementing standard trees-based regression models, which ignore the correlation structure. Our new approach explicitly takes the correlation structure into account in the splitting criterion, stopping rules and fitted values in the leaves, which induces some major modifications of standard methodology. The superiority of our new approach over trees-based models that do not account for the correlation is supported by simulation experiments and real data analyses.
翻译:本文介绍了一种基于树木的回归新办法,如简单回归树、随机森林和梯度增强等,在涉及相关数据的环境中。我们展示了实施标准的基于树木的回归模型时出现的问题,这些模型忽视了相关结构。我们的新办法明确将相关结构纳入分离标准、停止规则以及树叶中适合的数值,从而导致对标准方法作一些重大修改。我们的新办法优于不考虑相关关系的基于树木的模型,这得到了模拟实验和真实数据分析的支持。