特征归属方法的有效性及其与自动评价分数的相关性 (The effectiveness of feature attribution methods and its correlation with automatic evaluation scores)

Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these tools in their work. However, despite humans being the target end-users, most attribution methods were only evaluated on proxy automatic-evaluation metrics. In this paper, we conduct the first, large-scale user study on 320 lay and 11 expert users to shed light on the effectiveness of state-of-the-art attribution methods in assisting humans in ImageNet classification, Stanford Dogs fine-grained classification, and these two tasks but when the input image contains adversarial perturbations. We found that, in overall, feature attribution is surprisingly not more effective than showing humans nearest training-set examples. On a hard task of fine-grained dog categorization, presenting attribution maps to humans does not help, but instead hurts the performance of human-AI teams compared to AI alone. Importantly, we found automatic attribution-map evaluation measures to correlate poorly with the actual human-AI team performance. Our findings encourage the community to rigorously test their methods on the downstream human-in-the-loop applications and to rethink the existing evaluation metrics.

翻译：解释人工智能模式决定在许多现实世界、高比例的应用中越来越重要。数百篇论文要么提出了新的特征归属方法,讨论或在其工作中利用了这些工具。然而,尽管人类是目标最终用户,但大多数属性方法只根据代理自动评价衡量标准进行评估。在本文中,我们对320个非专业和11个专家用户进行第一次、大规模用户研究,以揭示在图像网络分类、斯坦福狗精密分类和这两个任务中帮助人类的最先进的归属方法的有效性,但在输入图像图像图像包含对抗性扰动图像时,我们发现,总体而言,特征归属并不比展示人类最接近的训练范例更为有效。在精细的狗分类这一艰巨任务中,向人类提供归属图没有帮助,相反地损害了人类个体团队的绩效。重要的是,我们发现,自动属性映射评价措施与人类-AI团队的实际业绩不相符。我们发现,总体而言,特征归属并不比展示人类最近的训练范例要有效。我们发现,要鼓励社区严格地检验其下游评估应用的方法。