Tree-ensemble algorithms, such as random forest, are effective machine learning methods popular for their flexibility, high performance, and robustness to overfitting. However, since multiple learners are combined, they are not as interpretable as a single decision tree. In this work we propose a novel method that is Building Explanations through a LocalLy AccuraTe Rule EXtractor (Bellatrex), and is able to explain the forest prediction for a given test instance with only a few diverse rules. Starting from the decision trees generated by a random forest, our method 1) pre-selects a subset of the rules used to make the prediction, 2) creates a vector representation of such rules, 3) projects them to a low-dimensional space, 4) clusters such representations to pick a rule from each cluster to explain the instance prediction. We test the effectiveness of Bellatrex on 89 real-world datasets and we demonstrate the validity of our method for binary classification, regression, multi-label classification and time-to-event tasks. To the best of our knowledge, it is the first time that an interpretability toolbox can handle all these tasks within the same framework. We also show that our extracted surrogate model can approximate the performance of the corresponding ensemble model in all considered tasks, while selecting only few trees from the whole forest. We also show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.
翻译:随机森林等树联运算法是有效的机器学习方法,因其灵活性、高性能和稳健性而通俗易行。然而,由于多个学习者是结合在一起的,因此它们不象单一决策树那样可解释。在这项工作中,我们提出了一个新颖的方法,即通过本地Ly Accurate 规则Exctractor(Bellatrex)来构建解释解释解释解释,并且能够用少数不同的规则来解释对某个测试实例的森林预测。从随机森林产生的决策树开始,我们的方法1(我们的方法1)预先选择了用来作出预测的一组规则,2)创建了这种规则的矢量代表,3)将它们投放到一个低维的空间,4)将这种表达分组分组分组从每个组中选取一条规则来解释实例预测。我们用89个真实世界数据集测试Bellatrex的有效性,我们展示了我们用于二进分类、回归、多标签分类和可实现的时间-活动任务的方法的有效性。我们最了解的情况是,我们第一次选择一个可解释的工具箱来处理所有这些任务,同时选择各种任务,我们也可以在同一个框架中选择各种任务。我们也可以在选择各种任务中选择各种任务的方式。我们也可以选择各种任务。我们也可以在选择各种任务。我们用整个的模型来解释方法。我们也可以在选择其他任务。我们也可以进行。我们也可以在选择了全部任务。我们选择了所有任务。我们用来在选择各种任务。