In recent years there has been growing attention to interpretable machine learning models which can give explanatory insights on their behavior. Thanks to their interpretability, decision trees have been intensively studied for classification tasks, and due to the remarkable advances in mixed-integer programming (MIP), various approaches have been proposed to formulate the problem of training an Optimal Classification Tree (OCT) as a MIP model. We present a novel mixed-integer quadratic formulation for the OCT problem, which exploits the generalization capabilities of Support Vector Machines for binary classification. Our model, denoted as Margin Optimal Classification Tree (MARGOT), encompasses the use of maximum margin multivariate hyperplanes nested in a binary tree structure. To enhance the interpretability of our approach, we analyse two alternative versions of MARGOT, which include feature selection constraints inducing local sparsity of the hyperplanes. First, MARGOT has been tested on non-linearly separable synthetic datasets in 2-dimensional feature space to provide a graphical representation of the maximum margin approach. Finally, the proposed models have been tested on benchmark datasets from the UCI repository. The MARGOT formulation turns out to be easier to solve than other OCT approaches, and the generated tree better generalizes on new observations. The two interpretable versions are effective in selecting the most relevant features and maintaining good prediction quality.
翻译:近年来,人们日益关注可解释的机器学习模型,这些模型可以解释其行为,从而解释其行为。由于这些模型的可解释性,决策树已经为分类任务进行了深入研究,而且由于混合整数编程(MIP)取得了显著的进展,因此提出了各种办法来将最佳分类树(OCT)培训成MIP模型的问题。我们为OCT问题提出了一种新型的混合因果二次配方配方,它利用支持矢量机的一般化能力进行二维分类。我们的模型,以 Margin 最佳最佳分类树(MARGOT)为标志,包括使用在二树结构中嵌入的最大差间多变性高平板。为了提高我们的方法的可解释性,我们分析了MARGOT的两种备选版本,其中包括造成超大平面图的局部偏差。首先,MARGOT在二维特性空间的非线性分解合成合成数据集中进行了测试,以提供最大边距方法的图形表示。最后,拟议的模型在OARGT的模型中比其他的正确判读方法更容易测试。