The goal of explainable Artificial Intelligence (XAI) is to generate human-interpretable explanations, but there are no computationally precise theories of how humans interpret AI generated explanations. The lack of theory means that validation of XAI must be done empirically, on a case-by-case basis, which prevents systematic theory-building in XAI. We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. Our theory posits that absent explanation humans expect the AI to make similar decisions to themselves, and that they interpret an explanation by comparison to the explanations they themselves would give. Comparison is formalized via Shepard's universal law of generalization in a similarity space, a classic theory from cognitive science. A pre-registered user study on AI image classifications with saliency map explanations demonstrate that our theory quantitatively matches participants' predictions of the AI.
翻译:可解释的人工智能(XAI)的目标是产生人类解释的解释,但是没有关于人类如何解释AI的解释的精确的计算理论。 缺乏理论意味着必须在个案基础上对XAI进行实证,这阻碍了XAI系统理论的建立。 我们提出了一个人类如何从突出的地图中得出结论的心理理论,这是XAI解释的最常见形式,首次允许精确预测解释推理以解释为条件。 我们的理论假设认为,没有解释的人期望AI作出类似的决定,他们通过比较他们自己作出的解释来解释解释。 比较是通过Shepard的类似空间的通用一般化法正式化的,这是认知科学的经典理论。 预先注册的关于AI图像分类的用户研究用突出的地图解释表明,我们的理论在数量上符合参与者对AI的预测。