Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating the predictions of black-box deep-learning systems due to their psychological validity, flexibility across problem domains and proposed legal compliance. While over 100 counterfactual methods exist, claiming to generate plausible explanations akin to those preferred by people, few have actually been tested on users ($\sim7\%$). So, the psychological validity of these counterfactual algorithms for effective XAI for image data is not established. This issue is addressed here using a novel methodology that (i) gathers ground truth human-generated counterfactual explanations for misclassified images, in two user studies and, then, (ii) compares these human-generated ground-truth explanations to computationally-generated explanations for the same misclassifications. Results indicate that humans do not "minimally edit" images when generating counterfactual explanations. Instead, they make larger, "meaningful" edits that better approximate prototypes in the counterfactual class.
翻译:反事实解释已经成为一种流行的解决方案,用于解释黑盒深层学习系统的预测,因为其心理有效性、问题领域的灵活性和拟议的法律合规性。 虽然存在100多种反事实方法,声称产生与人们喜欢的解释相似的可信解释,但实际上很少有人用用户($\sim7 ⁇ $)进行测试。因此,这些有效图像数据 XAI 的反事实算法的心理有效性没有被确定。这里使用的新方法是:(一) 在两个用户研究中收集人类产生的错误分类图像的真象反事实解释,然后,(二) 将这些人类产生的地面真相解释比作对同一错误分类的计算产生的解释。结果显示,人类在产生反事实解释时不会“最小编辑”图像。相反,它们做了更大规模的“有意义的”编辑,使反事实类中的原型更加接近。