As machine learning is increasingly applied to high-impact, high-risk domains, there have been a number of new methods aimed at making AI models more human interpretable. Despite the recent growth of interpretability work, there is a lack of systematic evaluation of proposed techniques. In this work, we propose a novel human evaluation framework HIVE (Human Interpretability of Visual Explanations) for diverse interpretability methods in computer vision; to the best of our knowledge, this is the first work of its kind. We argue that human studies should be the gold standard in properly evaluating how interpretable a method is to human users. While human studies are often avoided due to challenges associated with cost, study design, and cross-method comparison, we describe how our framework mitigates these issues and conduct IRB-approved studies of four methods that represent the diversity of interpretability works: GradCAM, BagNet, ProtoPNet, and ProtoTree. Our results suggest that explanations (regardless of if they are actually correct) engender human trust, yet are not distinct enough for users to distinguish between correct and incorrect predictions. Lastly, we also open-source our framework to enable future studies and to encourage more human-centered approaches to interpretability.
翻译:由于机器学习越来越多地应用于影响大、风险大的领域,因此出现了一些旨在使AI模型更便于人解释的新方法。尽管解释性工作最近有所增加,但缺乏对拟议技术的系统评价。在这项工作中,我们提议为计算机视野的多种解释方法建立一个新的人类评价框架HIVE(人类解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性解释性方法);根据我们的知识,这是这类研究的首创性工作。我们认为,在正确评估一种方法如何对用户进行解释性评价时,人类研究应当是金本位标准。尽管人类研究由于成本、研究设计和跨方法比较方面的挑战而经常避免进行。我们描述我们的框架如何减轻这些问题,并进行由IRB批准的对代表解释性工作多样性的四种方法的研究:GradCAM、BagNet、ProtoPNet和ProtoTree。我们的研究结果表明,解释性解释性解释性解释性(不论是否确实正确)应当引起人类的信任,但是对于用户来说,对于区分正确和错误预测的判断性判断性预测性分析性也不够。最后,我们还开放地解释了性地解释了我们框架,以便未来研究和解释性地解释性地解释性地理解性研究。