Explainable machine learning provides tools to better understand predictive models and their decisions, but many such methods are limited to producing insights with respect to a single class. When generating explanations for several classes, reasoning over them to obtain a complete view may be difficult since they can present competing or contradictory evidence. To address this issue we introduce a novel paradigm of multi-class explanations. We outline the theory behind such techniques and propose a local surrogate model based on multi-output regression trees -- called LIMEtree -- which offers faithful and consistent explanations of multiple classes for individual predictions while being post-hoc, model-agnostic and data-universal. In addition to strong fidelity guarantees, our implementation supports (interactive) customisation of the explanatory insights and delivers a range of diverse explanation types, including counterfactual statements favoured in the literature. We evaluate our algorithm with a collection of quantitative experiments, a qualitative analysis based on explainability desiderata and a preliminary user study on an image classification task, comparing it to LIME. Our contributions demonstrate the benefits of multi-class explanations and wide-ranging advantages of our method across a diverse set scenarios.
翻译:可解释的机器学习为更好地理解预测模型及其决定提供了工具,但许多这类方法仅限于就单个类别提供见解。在为几个类别提供解释时,很难对它们进行推理,以获得完整的观点,因为它们可以提出相互竞争或相互矛盾的证据。为了解决这一问题,我们引入了多类解释的新范式。我们概述了这些技术背后的理论,并提出了一个基于多输出回归树 -- -- 称为Limetree -- -- 的本地替代模型,该模型为个人预测提供多种类别的忠实和一致的解释,同时是后热、模型、不可知和数据通用的。除了强有力的忠诚保证外,我们的执行还支持(互动)对解释见解进行定制,并提供各种解释类型,包括文献中偏好的反事实陈述。我们用定量实验的集合、基于解释性贬低的定性分析以及图像分类任务的初步用户研究来评估我们的算法,将其与LIME进行比较。我们的贡献展示了多种类别解释的好处以及我们方法在多种假设中的广泛优势。