Counterfactual explanations, which deal with "why not?" scenarios, can provide insightful explanations to an AI agent's behavior. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Specifically, a counterfactual state illustrates what minimal change is needed to an Atari game image such that the agent chooses a different action. We also evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our first user study investigates if humans can discern if the counterfactual state explanations are produced by the actual game or produced by a generative deep learning approach. Our second user study investigates if counterfactual state explanations can help non-expert participants identify a flawed agent; we compare against a baseline approach based on a nearest neighbor explanation which uses images from the actual game. Our results indicate that counterfactual state explanations have sufficient fidelity to the actual game images to enable non-experts to more effectively identify a flawed RL agent compared to the nearest neighbor baseline and to having no explanation at all.
翻译:反事实解释涉及“ 为何不? ” 假想, 可以为AI 代理商的行为提供深刻的解释。 在这项工作中, 我们侧重于为在 Atari 这样的视觉输入环境中运作的深强化学习( RL) 代理商提供反事实解释。 我们引入反事实状态解释, 这是一种基于基因深思熟虑的新颖的以实例为基础的反事实解释方法。 具体地说, 反事实状态说明对Atari 游戏图像需要什么样的最小改变, 使代理商选择不同的行动。 我们还评估了反事实状态解释对不是机器学习专家的人类参与者的有效性。 我们的第一次用户研究调查人类是否能够辨别实际游戏产生的反事实状态解释, 或由基因深思熟虑的方法产生的反事实解释。 我们的第二次用户研究调查, 如果反事实解释有助于非专家确定一个有缺陷的代理商; 我们比较了以最近的邻居解释为基础的基线方法, 后者使用实际游戏的图像。 我们的结果表明, 反事实状态解释足以使非专家能够更有效地识别有缺陷的 RL 代理商, 在近邻基线上没有更好的解释。