Explaining the decisions of black-box models has been a central theme in the study of trustworthy ML. Numerous measures have been proposed in the literature; however, none of them have been able to adopt a provably causal take on explainability. Building upon Halpern and Pearl's formal definition of a causal explanation, we derive an analogous set of axioms for the classification setting, and use them to derive three explanation measures. Our first measure is a natural adaptation of Chockler and Halpern's notion of causal responsibility, whereas the other two correspond to existing game-theoretic influence measures. We present an axiomatic treatment for our proposed indices, showing that they can be uniquely characterized by a set of desirable properties. We compliment this with computational analysis, providing probabilistic approximation schemes for all of our proposed measures. Thus, our work is the first to formally bridge the gap between model explanations, game-theoretic influence, and causal analysis.
翻译:解释黑箱模型的决定一直是研究值得信赖的 ML 的核心主题。 文献中已经提出了许多措施; 但是,它们都没有能够就解释性采取可以想象的因果关系。 根据Halpern和Pearl对因果关系解释的正式定义,我们为分类设置得出一套相似的轴心,并用它们来得出三种解释性措施。 我们的第一个措施是对Chockler和Halpern的因果关系概念进行自然调整,而另外两个措施则与现有的游戏理论影响措施相对应。 我们对拟议的指数提出一种不言而喻的处理办法,表明它们具有一套理想特性的独特性。我们用计算分析来补充这一点,为我们所有拟议措施提供概率近似计划。因此,我们的工作是首先正式弥合模型解释、游戏理论影响和因果关系分析之间的差距。