对可解释的树形组合模型的精确反事实 -- -- 以实例为基础的办法 (An exact counterfactual-example-based approach to tree-ensemble models interpretability)

Explaining the decisions of machine learning models is becoming a necessity in many areas where trust in ML models decision is key to their accreditation/adoption. The ability to explain models decisions also allows to provide diagnosis in addition to the model decision, which is highly valuable in scenarios such as fault detection. Unfortunately, high-performance models do not exhibit the necessary transparency to make their decisions fully understandable. And the black-boxes approaches, which are used to explain such model decisions, suffer from a lack of accuracy in tracing back the exact cause of a model decision regarding a given input. Indeed, they do not have the ability to explicitly describe the decision regions of the model around that input, which is necessary to determine what influences the model towards one decision or the other. We thus asked ourselves the question: is there a category of high-performance models among the ones currently used for which we could explicitly and exactly characterise the decision regions in the input feature space using a geometrical characterisation? Surprisingly we came out with a positive answer for any model that enters the category of tree ensemble models, which encompasses a wide range of high-performance models such as XGBoost, LightGBM, random forests ... We could derive an exact geometrical characterisation of their decision regions under the form of a collection of multidimensional intervals. This characterisation makes it straightforward to compute the optimal counterfactual (CF) example associated with a query point. We demonstrate several possibilities of the approach, such as computing the CF example based only on a subset of features. This allows to obtain more plausible explanations by adding prior knowledge about which variables the user can control. An adaptation to CF reasoning on regression problems is also envisaged.

翻译：在许多领域,对ML模型决定的信任是其认证/通过的关键所在。解释模型决定的能力还允许在模型决定之外提供诊断,而模型决定又能够提供诊断,而模型决定在发现错误等情形中非常宝贵。不幸的是,高性能模型没有表现出必要的透明度来使其决定完全可以理解。用于解释这种模型决定的黑箱方法在追踪关于某一投入的模型决定的精确原因方面缺乏准确性。事实上,它们没有能力明确描述该投入的模型决定区域,而这种能力对于确定模型对某一决定或另一个决定的影响是必要的。因此,我们问自己:目前所使用的高性能模型中是否有一种高性能模型,我们可明确和准确地描述投入空间中的决策区域,使用几何性格特征特征来解释这些模型;令人惊讶的是,我们对于任何进入相关投入的模型的模型,都缺乏准确性能的答案,我们只能通过直率的变现的变值模型,其中也包括一系列的性能模型,例如 XGBoost 推算出其直截面的变法。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【CVPR2020-CUHK】探索和利用GANs中的可解释语义，60页ppt，Exploring and Exploiting Interpretable Semantics in GANs

专知会员服务

13+阅读 · 2020年6月18日

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

49+阅读 · 2020年1月1日