Complex black-box machine learning models are regularly used in critical decision-making domains. This has given rise to several calls for algorithmic explainability. Many explanation algorithms proposed in literature assign importance to each feature individually. However, such explanations fail to capture the joint effects of sets of features. Indeed, few works so far formally analyze high-dimensional model explanations. In this paper, we propose a novel high dimension model explanation method that captures the joint effect of feature subsets. We propose a new axiomatization for a generalization of the Banzhaf index; our method can also be thought of as an approximation of a black-box model by a higher-order polynomial. In other words, this work justifies the use of the generalized Banzhaf index as a model explanation by showing that it uniquely satisfies a set of natural desiderata and that it is the optimal local approximation of a black-box model. Our empirical evaluation of our measure highlights how it manages to capture desirable behavior, whereas other measures that do not satisfy our axioms behave in an unpredictable manner.
翻译:复杂的黑盒机器学习模型经常在关键决策领域使用。 这引起了数种逻辑解释的呼声。 许多文献中提议的解释算法对每个特性都很重要。 但是,这种解释未能捕捉到各特性的连带效果。 事实上,到目前为止,很少有人正式分析高维模型解释。 在本文中,我们提出了一个新的高维模型解释方法,捕捉特性子集的连带效应。 我们为泛沙夫指数的概括性提出了一个新的分解模式; 我们的方法也可以被一个更高级的多式混合体视为黑盒模型的近似。 换句话说, 这项工作证明使用泛沙夫通用指数作为示范解释是合理的, 表明它独有一套自然淡化模型, 并且它是黑盒模型的最佳本地近似。 我们对我们的计量实验性评价突出表明它如何掌握可取行为, 而不能满足我们xioms的其他措施则以不可预测的方式行事。