Counterfactual explanations are a common tool to explain artificial intelligence models. For Reinforcement Learning (RL) agents, they answer "Why not?" or "What if?" questions by illustrating what minimal change to a state is needed such that an agent chooses a different action. Generating counterfactual explanations for RL agents with visual input is especially challenging because of their large state spaces and because their decisions are part of an overarching policy, which includes long-term decision-making. However, research focusing on counterfactual explanations, specifically for RL agents with visual input, is scarce and does not go beyond identifying defective agents. It is unclear whether counterfactual explanations are still helpful for more complex tasks like analyzing the learned strategies of different agents or choosing a fitting agent for a specific task. We propose a novel but simple method to generate counterfactual explanations for RL agents by formulating the problem as a domain transfer problem which allows the use of adversarial learning techniques like StarGAN. Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics. Furthermore, we show in a user study that our method performs best when analyzing which strategies different agents pursue.
翻译:反事实解释是解释人工智能模型的常用工具。 对于强化学习(RL)代理商来说,他们回答“为什么不?”或“如果?”, 回答“为什么不” 或“如果?” 问题,通过说明需要对国家作什么最起码的改变,使代理商选择不同的行动。为RL代理商提出反事实解释,加上视觉输入,特别具有挑战性,因为其决定是总体政策的一部分,其中包括长期决策。然而,侧重于反事实解释的研究,特别是有视觉输入的RL代理商的研究很少,而且没有超出识别有缺陷的代理商的范围。目前还不清楚反事实解释是否仍然有助于执行更复杂的任务,例如分析不同代理商的学习战略,或为某项具体任务选择一个合适的代理商。我们提出了一个新而简单的方法来为RL代理商提出反事实解释,方法是将问题描述为一个域传输问题,从而允许使用StarGAN这样的对抗性学习技术。我们的方法是完全的模型和不可知性,并且我们证明它超越了在数个计算指标中唯一的方法。此外,我们在一个用户研究中显示,我们的方法是在分析不同的代理商采用何种战略时最有效的方法。</s>