Explaining the behavior of reinforcement learning agents operating in sequential decision-making settings is challenging, as their behavior is affected by a dynamic environment and delayed rewards. Methods that help users understand the behavior of such agents can roughly be divided into local explanations that analyze specific decisions of the agents and global explanations that convey the general strategy of the agents. In this work, we study a novel combination of local and global explanations for reinforcement learning agents. Specifically, we combine reward decomposition, a local explanation method that exposes which components of the reward function influenced a specific decision, and HIGHLIGHTS, a global explanation method that shows a summary of the agent's behavior in decisive states. We conducted two user studies to evaluate the integration of these explanation methods and their respective benefits. Our results show significant benefits for both methods. In general, we found that the local reward decomposition was more useful for identifying the agents' priorities. However, when there was only a minor difference between the agents' preferences, then the global information provided by HIGHLIGHTS additionally improved participants' understanding.
翻译:解释在相继决策环境中运作的强化学习代理人的行为具有挑战性,因为其行为受到动态环境和延迟奖励的影响。帮助用户理解这些代理人行为的方法可以大致分为局部解释,分析代理人的具体决定和传达代理人一般战略的全球解释。在这项工作中,我们研究了当地和全球对强化学习代理人的新解释的结合。具体地说,我们结合了奖励分解,一种地方解释方法,暴露了奖励功能的哪些组成部分影响特定决定,而HeightS是一种全球解释方法,显示该代理人在决定性国家的行为摘要。我们进行了两项用户研究,以评估这些解释方法的整合及其各自的好处。我们的结果显示这两种方法都具有重大的好处。一般来说,我们发现当地奖励分解对于确定代理人的优先事项更为有用。然而,当代理人的偏好之间只有很小的差别时,然后是HERLightS提供的全球信息,从而进一步提高了参与者的理解。