This paper shows that decision trees constructed with Classification and Regression Trees (CART) methodology are universally consistent for additive models, even when the dimensionality scales exponentially with the sample size, under certain $\ell_1$ sparsity constraints. The consistency is universal in the sense that there are no a priori assumptions on the distribution of the input variables. Surprisingly, this adaptivity to (approximate or exact) sparsity is achieved with a single tree, as opposed to what might be expected for an ensemble. Finally, we show that these qualitative properties of individual trees are inherited by Breiman's random forests. A key step in the analysis is the establishment of an oracle inequality, which precisely characterizes the goodness-of-fit and complexity tradeoff.
翻译:本文表明,使用分类和递减树(CART)方法构建的决策树对于添加模型来说是普遍一致的,即使尺寸与样本大小成指数,在一定的美元=1美元的宽度限制下,这种一致性是普遍的,因为对于输入变量的分布没有先验的假设。令人惊讶的是,这种适应(近似或确切)宽度的适应性是用一棵树实现的,而不是对合谋的预期。最后,我们表明,个别树木的这些定性特性是由布雷曼的随机森林继承的。分析中的一个关键步骤是建立甲骨文不平等,这恰恰是完美和复杂的权衡的特征。