Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to interpret tree ensemble classifiers by surfacing representative points for each class -- prototypes. We introduce a new distance for Gradient Boosted Tree models, and propose new, adaptive prototype selection methods with theoretical guarantees, with the flexibility to choose a different number of prototypes in each class. We demonstrate our methods on random forests and gradient boosted trees, showing that the prototypes can perform as well as or even better than the original tree ensemble when used as a nearest-prototype classifier. In a user study, humans were better at predicting the output of a tree ensemble classifier when using prototypes than when using Shapley values, a popular feature attribution method. Hence, prototypes present a viable alternative to feature-based explanations for tree ensembles.
翻译:决策树群在很多问题上表现良好,但不能解释。与当前侧重于解释特征和预测之间关系的解释性方法相比,我们提出了一种通过对每一类 -- -- 原型 -- -- 的代表点进行表面显示解释树群分类方法的替代方法。我们引入了渐渐推动树模型的新距离,并提出了具有理论保障的适应性原型选择方法,在每类中选择不同数量的原型具有灵活性。我们展示了我们随机森林和梯度推动树的方法,表明原型在用作最接近的原型原型分类器时既可以发挥作用,也可以或甚至比原型树团群更好。在一项用户研究中,人类在使用树群分类模型时比使用沙普利值(一种流行的特性归因方法)时更能预测树群的输出。因此,原型对树团群使用原型比使用原型(一种流行的特征归因方法)提供一种可行的解释特征的替代方法。