Rule ensembles are designed to provide a useful trade-off between predictive accuracy and model interpretability. However, the myopic and random search components of current rule ensemble methods can compromise this goal: they often need more rules than necessary to reach a certain accuracy level or can even outright fail to accurately model a distribution that can actually be described well with a few rules. Here, we present a novel approach aiming to fit rule ensembles of maximal predictive power for a given ensemble size (and thus model comprehensibility). In particular, we present an efficient branch-and-bound algorithm that optimally solves the per-rule objective function of the popular second-order gradient boosting framework. Our main insight is that the boosting objective can be tightly bounded in linear time of the number of covered data points. Along with an additional novel pruning technique related to rule redundancy, this leads to a computationally feasible approach for boosting optimal rules that, as we demonstrate on a wide range of common benchmark problems, consistently outperforms the predictive performance of boosting greedy rules.
翻译:规则组合的设计是为了在预测准确性和模型解释性之间提供一个有益的权衡。 但是,当前规则共同方法的短视和随机搜索部分可能会损害这个目标:它们往往需要比必要的更多的规则才能达到一定的精确水平,甚至可能完全无法精确地模拟能够以少数规则来很好地描述的分布。这里,我们提出了一个新颖的方法,旨在将规则集合成一个给定的组合体大小(从而模拟可理解性)的最大预测力。特别是,我们提出了一个高效的分支和约束算法,以最佳的方式解决流行的二阶梯度加速框架的准目标功能。我们的主要见解是,提振目标可以在覆盖数据点的线性时间里紧密地捆绑在一起。加上与规则冗余有关的另外一种新的修剪裁技术,这将导致一种在计算上可行的方法,用以促进最佳规则,正如我们在广泛的共同基准问题上所显示的那样,这始终超越了提振贪婪规则的预测性表现。