Regression trees and their ensemble methods are popular methods for non-parametric regression: they combine strong predictive performance with interpretable estimators. In order to improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggregation functions on the corresponding nodes, and we offer a quasilinear time implementation. We apply the algorithm to several simulated and real-world data sets. We showcase its favorable performance in an extensive simulation study, and demonstrate its improved interpretability using a large get-out-the-vote randomized controlled trial. We also provide a software package that implements several tree-based estimators with linear aggregation functions and includes tools for inference.
翻译:回归树及其组合方法是非参数回归的常用方法:它们将强的预测性能与可解释的估测器结合起来。为了提高它们对于当地平稳反应表面的实用性,我们研究回归树和随机森林,并使用线性聚合功能。我们引入了一种新的算法,找到最佳轴齐分法,在相应的节点上设置线性汇总功能,我们提供准线性时间执行。我们将算法应用于若干模拟和真实世界数据集。我们在广泛的模拟研究中展示其有利的性能,并使用大规模脱机随机控制的试验来展示其改进的可解释性。我们还提供了一套软件,用线性集合功能执行数个基于树的估量器,并包括推断工具。