When explaining AI behavior to humans, how is the communicated information being comprehended by the human explainee, and does it match what the explanation attempted to communicate? When can we say that an explanation is explaining something? We aim to provide an answer by leveraging theory of mind literature about the folk concepts that humans use to understand behavior. We establish a framework of social attribution by the human explainee, which describes the function of explanations: the concrete information that humans comprehend from them. Specifically, effective explanations should be coherent (communicate information which generalizes to other contrast cases), complete (communicating an explicit contrast case, objective causes, and subjective causes), and interactive (surfacing and resolving contradictions to the generalization property through iterations). We demonstrate that many XAI mechanisms can be mapped to folk concepts of behavior. This allows us to uncover their modes of failure that prevent current methods from explaining effectively, and what is necessary to enable coherent explanations.
翻译:当向人类解释AI行为时,传播的信息如何被人类解释者所理解,以及它是否与解释试图传达的信息相符?我们什么时候可以说解释正在解释什么?我们的目标是通过利用关于人类用来理解行为的民俗概念的思想文献理论来提供答案;我们建立人类解释者的社会归属框架,它描述了解释的功能:人类从他们那里理解的具体信息。具体地说,有效解释应该是连贯的(将信息概括到其他对比情况)、完整(消除明显的对比情况、客观原因和主观原因)以及互动(通过迭代来描述和解决与一般化财产的矛盾) 。我们证明许多XAI机制可以被描述为民俗行为概念。这使我们能够发现其失败模式,从而阻止当前方法有效解释,以及需要什么来促成一致的解释。