Feature attribution methods are popular in interpretable machine learning. These methods compute the attribution of each input feature to represent its importance, but there is no consensus on the definition of "attribution", leading to many competing methods with little systematic evaluation, complicated in particular by the lack of ground truth attribution. To address this, we propose a dataset modification procedure to induce such ground truth. Using this procedure, we evaluate three common methods: saliency maps, rationales, and attentions. We identify several deficiencies and add new perspectives to the growing body of evidence questioning the correctness and reliability of these methods applied on datasets in the wild. We further discuss possible avenues for remedy and recommend new attribution methods to be tested against ground truth before deployment. The code is available at \url{https://github.com/YilunZhou/feature-attribution-evaluation}.
翻译:在可解释的机器学习中,特性归属方法很受欢迎。这些方法计算每个输入特性的属性,以表明其重要性,但对于“归属”的定义没有达成共识,导致许多相互竞争的方法,很少进行系统的评估,特别是缺乏地面真相归属,这特别复杂。为解决这一问题,我们提议了一个数据集修改程序,以引出这种地面真相。我们利用这一程序,评估了三种共同方法:突出的地图、理由和关注。我们找出了几个缺陷,并在越来越多的证据中增加了新的视角,质疑在野生数据集中应用的这些方法的正确性和可靠性。我们进一步讨论了可能的补救途径,并建议了新的归属方法,以便在部署之前根据地面真相进行测试。代码可在以下网站查阅:https://github.com/YilunZhou/feature-atrition-valuation}。