Various attribution methods have been developed to explain deep neural networks (DNNs) by inferring the attribution/importance/contribution score of each input variable to the final output. However, existing attribution methods are often built upon different heuristics. There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related. To this end, for the first time, we formulate core mechanisms of fourteen attribution methods, which were designed on different heuristics, into the same mathematical system, i.e., the system of Taylor interactions. Specifically, we prove that attribution scores estimated by fourteen attribution methods can all be reformulated as the weighted sum of two types of effects, i.e., independent effects of each individual input variable and interaction effects between input variables. The essential difference among the fourteen attribution methods mainly lies in the weights of allocating different effects. Based on the above findings, we propose three principles for a fair allocation of effects to evaluate the faithfulness of the fourteen attribution methods.
翻译:为解释深神经网络(DNNs),通过推算每种输入变量的归属/重要性/贡献得分对最后产出的归因/重要性/贡献得分,制定了各种归因方法,但现有的归因方法往往以不同的推算法为基础。对于这些方法为何有效及其相互关系,仍然缺乏统一的理论理解。为此目的,我们首次将14种归因方法的核心机制,即根据不同超自然学设计的14种归因方法,纳入同一数学系统,即泰勒互动系统。具体地说,我们证明,14种归因方法估计的归因分得分,都可以重新拟订为两种效果的加权总和,即每个输入变量的独立效应和输入变量之间的相互作用效应。14种归因方法之间的根本区别主要在于分配不同效应的权重。根据上述调查结果,我们提出了公平分配效果的三项原则,以评价14种归因方法的准确性。</s>