Counterfactual (CF) explanations have been employed as one of the modes of explainability in explainable AI-both to increase the transparency of AI systems and to provide recourse. Cognitive science and psychology, however, have pointed out that people regularly use CFs to express causal relationships. Most AI systems are only able to capture associations or correlations in data so interpreting them as casual would not be justified. In this paper, we present two experiment (total N = 364) exploring the effects of CF explanations of AI system's predictions on lay people's causal beliefs about the real world. In Experiment 1 we found that providing CF explanations of an AI system's predictions does indeed (unjustifiably) affect people's causal beliefs regarding factors/features the AI uses and that people are more likely to view them as causal factors in the real world. Inspired by the literature on misinformation and health warning messaging, Experiment 2 tested whether we can correct for the unjustified change in causal beliefs. We found that pointing out that AI systems capture correlations and not necessarily causal relationships can attenuate the effects of CF explanations on people's causal beliefs.
翻译:反事实(CF)解释被作为解释AI系统的解释的一种解释方式,一方面是为了提高AI系统的透明度,另一方面是为了提供追索手段。认知科学和心理学指出,人们经常使用CFs来表达因果关系。大多数AI系统只能捕捉数据中的关联或关联性,因此将它们解释为临时数据是没有道理的。在本文中,我们提出两个实验(总共N=364),探讨AI系统预测的CF解释对人对真实世界的因果关系信仰的影响。在实验1中,我们发现,对AI系统的预测提供CF解释确实(不合理地)影响人们对AI使用的各种因素/特点的因果关系,而且人们更有可能将这些数据视为现实世界中的因果关系因素。根据关于错误信息和健康警告信息方面的文献,实验2测试了我们是否能够纠正因果信仰的不合理变化。我们发现,AI系统能够捕捉关联性,而不一定是因果关系,可以减轻CF解释对人因信仰的影响。