Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these tools in their work. However, despite humans being the target end-users, most attribution methods were only evaluated on proxy automatic-evaluation metrics (Zhang et al. 2018; Zhou et al. 2016; Petsiuk et al. 2018). In this paper, we conduct the first user study to measure attribution map effectiveness in assisting humans in ImageNet classification and Stanford Dogs fine-grained classification, and when an image is natural or adversarial (i.e., contains adversarial perturbations). Overall, feature attribution is surprisingly not more effective than showing humans nearest training-set examples. On a harder task of fine-grained dog categorization, presenting attribution maps to humans does not help, but instead hurts the performance of human-AI teams compared to AI alone. Importantly, we found automatic attribution-map evaluation measures to correlate poorly with the actual human-AI team performance. Our findings encourage the community to rigorously test their methods on the downstream human-in-the-loop applications and to rethink the existing evaluation metrics.
翻译:解释人工智能模式决定在许多现实世界、高比例的应用中越来越重要。数百篇论文要么提出了新的特征归属方法,讨论或在其工作中利用了这些工具。然而,尽管人是目标终端用户,但大多数属性方法仅根据代理自动评估指标进行评估(张等人,2018年;周等人,2016年;佩西乌克等人,2018年)。在本文中,我们进行了第一次用户研究,以衡量在图像网络分类和斯坦福狗类精细分类中帮助人类的归属地图效力,以及当图像是自然或对抗性的(即含有对抗性干扰)。总体而言,特征归属并不比展示人类最近的训练范例更为有效。在细微的狗类分类这一更艰巨的任务中,向人类提供归属图无助于,反而伤害了人类个体团队的业绩。重要的是,我们发现自动属性映射评价措施与实际人类-AI团队业绩的相关性差强。我们的调查结果鼓励社区严格地测试其下游评估方法。我们鼓励社区在下游评估中严格地检验其标准应用。