As machine learning is increasingly applied to high-impact, high-risk domains, there have been a number of new methods aimed at making AI models more human interpretable. Despite the recent growth of interpretability work, there is a lack of systematic evaluation of proposed techniques. In this work, we propose HIVE (Human Interpretability of Visual Explanations), a novel human evaluation framework for visual interpretability methods that allows for falsifiable hypothesis testing, cross-method comparison, and human-centered evaluation. To the best of our knowledge, this is the first work of its kind. Using HIVE, we conduct IRB-approved human studies with nearly 1000 participants and evaluate four methods that represent the diversity of computer vision interpretability works: GradCAM, BagNet, ProtoPNet, and ProtoTree. Our results suggest that explanations engender human trust, even for incorrect predictions, yet are not distinct enough for users to distinguish between correct and incorrect predictions. We open-source HIVE to enable future studies and to encourage more human-centered approaches to interpretability research.
翻译:由于机器学习越来越多地应用于影响大、风险大的领域,因此出现了一些旨在使AI模型更易为人理解的新方法。尽管解释性工作最近有所增加,但缺乏对拟议技术的系统评价。在这项工作中,我们提议HIVE(视觉解释的人类解释性),这是视觉解释性方法的新人类评价框架,允许进行可变假设测试、交叉方法比较和以人为中心的评价。就我们所知的最好情况而言,这是这类工作的首次。我们使用HIVE,进行了IRB核准的人类研究,有近1000人参加,并评价了代表计算机可判识性工作多样性的四种方法:GradCAM、BagNet、ProtoPNet和ProtoTree。我们的结果表明,解释产生人类信任,即使是不正确的预测,但对于用户来说仍然不够明确,无法区分正确和不正确的预测。我们开源HIVE能够进行未来研究,并鼓励以更以人为本的方式进行解释性研究。