Deep reinforcement learning (RL) agents are becoming increasingly proficient in a range of complex control tasks. However, the agent's behavior is usually difficult to interpret due to the introduction of black-box function, making it difficult to acquire the trust of users. Although there have been some interesting interpretation methods for vision-based RL, most of them cannot uncover temporal causal information, raising questions about their reliability. To address this problem, we present a temporal-spatial causal interpretation (TSCI) model to understand the agent's long-term behavior, which is essential for sequential decision-making. TSCI model builds on the formulation of temporal causality, which reflects the temporal causal relations between sequential observations and decisions of RL agent. Then a separate causal discovery network is employed to identify temporal-spatial causal features, which are constrained to satisfy the temporal causality. TSCI model is applicable to recurrent agents and can be used to discover causal features with high efficiency once trained. The empirical results show that TSCI model can produce high-resolution and sharp attention masks to highlight task-relevant temporal-spatial information that constitutes most evidence about how vision-based RL agents make sequential decisions. In addition, we further demonstrate that our method is able to provide valuable causal interpretations for vision-based RL agents from the temporal perspective.
翻译:深度强化学习(RL)剂在一系列复杂的控制任务中越来越熟练。然而,由于引入黑盒功能,该剂的行为通常难以解释,因此很难解释,因此难以获得用户的信任。虽然对基于愿景的RL有一些有趣的解释方法,但其中多数无法发现时间因果信息,引起对其可靠性的疑问。为解决这一问题,我们提出了一个时间空间因果解释模型,以了解该剂的长期行为,这是连续决策所必不可少的。TSCI模型建立在时间因果关系的公式上,反映了顺序观测和RL剂决定之间的时间因果关系。随后,一个单独的因果发现网络被用来确定时间空间因果特性,这些特性受时间因果特性制约,满足时间因果特性。TSCI模型适用于经常性剂,一旦经过培训,就可以用来发现高效率的因果特性。经验显示,TSCI模型能够产生高分辨率和尖锐的注意面罩,突出与任务相关的时间空间信息,从而最能证明基于愿景的RL剂的代谢性解释方法能够进一步显示我们基于基于愿景的RL剂的连续解释方法。