Many works in explainable AI have focused on explaining black-box classification models. Explaining deep reinforcement learning (RL) policies in a manner that could be understood by domain users has received much less attention. In this paper, we propose a novel perspective to understanding RL policies based on identifying important states from automatically learned meta-states. The key conceptual difference between our approach and many previous ones is that we form meta-states based on locality governed by the expert policy dynamics rather than based on similarity of actions, and that we do not assume any particular knowledge of the underlying topology of the state space. Theoretically, we show that our algorithm to find meta-states converges and the objective that selects important states from each meta-state is submodular leading to efficient high quality greedy selection. Experiments on four domains (four rooms, door-key, minipacman, and pong) and a carefully conducted user study illustrate that our perspective leads to better understanding of the policy. We conjecture that this is a result of our meta-states being more intuitive in that the corresponding important states are strong indicators of tractable intermediate goals that are easier for humans to interpret and follow.
翻译:在许多可以解释的AI中,许多工作侧重于解释黑盒分类模式。 解释深度强化学习(RL)政策的方式可以让域用户理解,却受到的关注少得多。 在本文中,我们提出了一个基于从自动学习的元国家中识别重要状态的新视角来理解远程学习政策。 我们的方法和许多以前的方法之间的关键概念区别在于,我们根据专家政策动态而不是行动相似性所支配的地理位置形成元状态,我们不认为我们对国家空间基本地形学有任何特定了解。 理论上,我们表明,我们找到元国家的算法是集中的,从每个元国家中选择重要状态的目标是次式的,导致高效的高度贪婪选择。 在四个领域(四间房间、门钥匙、迷你和海绵)的实验和一项仔细进行的用户研究表明,我们的观点可以更好地理解政策。 我们推测,这是我们元国家更直观地认为,在相应的重要状态中,我们元状态是可移动的中间目标的有力指标,对于人类更便于解释和跟踪。