Decision trees are widely-used classification and regression models because of their interpretability and good accuracy. Classical methods such as CART are based on greedy approaches but a growing attention has recently been devoted to optimal decision trees. We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al. (EJOR, vol. 284, 2020; COR, vol. 132, 2021) for (sparse) optimal randomized classification trees. Sparsity is important not only for feature selection but also to improve interpretability. We first consider alternative methods to sparsify such trees based on concave approximations of the $l_{0}$ ``norm". Promising results are obtained on 24 datasets in comparison with $l_1$ and $l_{\infty}$ regularizations. Then, we derive bounds on the VC dimension of multivariate randomized classification trees. Finally, since training is computationally challenging for large datasets, we propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
翻译:决策树是广泛使用的分类和回归模型,因为其可解释性和准确性很高。古典方法,如CART(CART)基于贪婪的方法,但最近越来越关注最佳决策树。我们调查了Blanquero等人(EJOR,第284卷,2020年;COR,第132卷,2021号)为(粗)最佳随机分类树提出的非线性连续优化配方(EJOR,第132卷,2021号),不仅对于特性选择很重要,而且对于改进可解释性也十分重要。我们首先考虑根据“规范”$+0美元近似值对此类树木进行再加固的替代方法。我们从24个数据集中获得了有希望的结果,与$_1美元和$lüinfty}值的正规化。然后,我们从多变量随机分类树的VC方面得出了界限。最后,由于对大型数据集的计算具有挑战性,我们建议了一个一般解析方案,并采用高效的版本。关于较大数据集的实验显示,拟议的解剖法能够大大降低培训时间而不损害准确性。