Explaining the behavior of AI systems is an important problem that, in practice, is generally avoided. While the XAI community has been developing an abundance of techniques, most incur a set of costs that the wider deep learning community has been unwilling to pay in most situations. We take a pragmatic view of the issue, and define a set of desiderata that capture both the ambitions of XAI and the practical constraints of deep learning. We describe an effective way to satisfy all the desiderata: train the AI system to build a causal model of itself. We develop an instance of this solution for Deep RL agents: Causal Self-Talk. CST operates by training the agent to communicate with itself across time. We implement this method in a simulated 3D environment, and show how it enables agents to generate faithful and semantically-meaningful explanations of their own behavior. Beyond explanations, we also demonstrate that these learned models provide new ways of building semantic control interfaces to AI systems.
翻译:解释AI系统的行为是实践中普遍避免的一个重要问题。 虽然 XAI 社区一直在开发大量技术, 但大多数都产生一系列成本, 而更广大的深层学习社区在多数情况下不愿意支付。 我们对这个问题持务实的观点, 并定义一套包含 XAI 雄心和深层学习的实际限制的贬损。 我们描述一种满足所有缺陷的有效方法: 训练AI 系统以建立自身因果模型。 我们为深层RL 代理商开发了一个这种解决方案的例子: Causal Selal-Talk。 科技委通过培训该代理商在时间上相互沟通来运作。 我们在模拟的 3D 环境中实施这种方法, 并展示该方法如何让代理商产生对自身行为的忠实和有语义意义的解释。 除了解释外, 我们还证明这些学习的模型为建立 AI 系统 的语义控制界面提供了新的方法 。