Attribution methods provide an insight into the decision-making process of machine learning models, especially deep neural networks, by assigning contribution scores to each individual feature. However, the attribution problem has not been well-defined, which lacks a unified guideline to the contribution assignment process. Furthermore, existing attribution methods often built upon various empirical intuitions and heuristics. There still lacks a general theoretical framework that not only can offer a good description of the attribution problem, but also can be applied to unifying and revisiting existing attribution methods. To bridge the gap, in this paper, we propose a Taylor attribution framework, which models the attribution problem as how to decide individual payoffs in a coalition. Then, we reformulate fourteen mainstream attribution methods into the Taylor framework and analyze these attribution methods in terms of rationale, fidelity, and limitation in the framework. Moreover, we establish three principles for a good attribution in the Taylor attribution framework, i.e., low approximation error, correct Taylor contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets.
翻译:归因方法通过为每个特性分配贡献分数,对机器学习模式,特别是深神经网络的决策过程提供了深入的了解,对机器学习模式,特别是深神经网络的决策过程提供了深入的了解;然而,归因问题没有明确界定,对交费分配过程缺乏统一的准则;此外,现有的归因方法往往基于各种经验直觉和累进论;仍然缺乏一个一般性的理论框架,不仅能够很好地描述归因问题,而且还可以用于统一和重新审视现有的归因方法;为了缩小差距,我们在本文件中提议泰勒归因框架,将归因问题作为如何在联合中决定个人报酬的模型;然后,我们重新将14种归因方法纳入泰勒框架的主流,并在框架的理由、忠诚和限制方面分析这些归因方法;此外,我们为泰勒归因框架的良好归因,即低近似错误、纠正泰勒的交费分配,以及公正的基准选择,制定了三项原则;最后,我们从经验上证实泰勒重订的归因业绩和通过对地世界数据进行基准确定归因方法所遵循的原则数目之间的正相关。