One major barrier to applications of deep Reinforcement Learning (RL) both inside and outside of games is the lack of explainability. In this paper, we describe a lightweight and effective method to derive explanations for deep RL agents, which we evaluate in the Atari domain. Our method relies on a transformation of the pixel-based input of the RL agent to an interpretable, percept-like input representation. We then train a surrogate model, which is itself interpretable, to replicate the behavior of the target, deep RL agent. Our experiments demonstrate that we can learn an effective surrogate that accurately approximates the underlying decision making of a target agent on a suite of Atari games.
翻译:在游戏内外应用深强化学习(RL)的一个主要障碍是缺乏解释。在本文中,我们描述了一种轻量级和有效的方法来解释深强化学习(RL)剂,我们在阿塔里域对此进行了评估。我们的方法依赖于将RL剂基于像素的输入转换成可解释的、感知式的输入表示。然后我们训练了一种替代模型,该模型本身是可以解释的,以复制目标的行为,深RL剂。我们的实验表明,我们可以学到一个有效的替代模型,精确地估计Atari游戏套件上目标剂的基本决策。