When explaining AI behavior to humans, how does a human explainee comprehend the communicated information, and does it match what the explanation attempted to communicate? When can we say that an explanation is explaining something? We aim to provide an answer by leveraging theory of mind literature about the folk concepts that humans use to understand behavior. We establish a framework of social attribution by the human explainee, which describes the function of explanations: the information that humans comprehend from them. Specifically, effective explanations should produce coherent mental models (communicate information which generalizes to other contrast cases), complete (communicate an explicit causal narrative of a contrast case, representation causes, affected representation, and external causes), and interactive (surface and resolve contradictions to the generalization property through interrogation). We demonstrate that many XAI mechanisms can be mapped to folk concepts of behavior. This allows us to uncover their failure modes that prevent current methods from explaining effectively, and what is necessary to enable coherent explanations.
翻译:当向人类解释AI行为时,人类的解释者如何理解传递的信息,这与解释试图传达的信息相符?当我们何时可以说解释正在解释什么?我们的目标是通过利用关于人类用来理解行为的民间概念的思想理论文献来提供答案。我们建立了人类解释者的社会归属框架,该框架描述了解释的功能:人类从他们那里理解的信息。具体地说,有效的解释应该产生一致的心理模型(将信息同其他对比案例相提并论)、完整(对对比案件、陈述原因、受影响陈述和外部原因进行明确的因果关系说明)以及互动(通过询问,表层和解决对一般财产的矛盾),我们证明许多XAI机制可以被描述为民间行为概念。这使我们能够发现其失败模式,从而阻止当前方法有效解释,并且需要什么来促成一致的解释。