Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an $L^2$ boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for $PI$ecewise $L$inear $O$rganic $T$ree, where `organic' refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.
翻译:线性模型树是包含叶节点线性模型的回归树。 这保留了对决定树的直观解释, 并同时使他们能够更好地捕捉线性关系, 这对标准决策树来说是困难的。 但是, 安装线性模型树的大多数现有方法耗时, 因而无法向大型数据集缩放。 此外, 它们比标准回归树更易受过度装配和外推问题。 在本文中, 我们引入了PILOT, 线性模型树的新算法是快速、 正规、 稳定和可解释的。 PILOT 以像经典回归树那样的贪婪方式培训, 但也包含一个$L2$的提振法和适用于节点线性模型的示范选择规则。 缩写 PILOT 是美元, 美元为美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 而在标准回归树下, PILOT 与 CART 一样, 时间和空间的复杂程度与 CART 一样。 实验研究表明, 当PILOT 倾向于超越标准标准的趋, 标准的趋一致性模型,, 数据在模型下, 正在建立一个较弱的线性模型中, 。