Recent years saw a plethora of work on explaining complex intelligent agents. One example is the development of several algorithms that generate saliency maps which show how much each pixel attributed to the agents' decision. However, most evaluations of such saliency maps focus on image classification tasks. As far as we know, there is no work that thoroughly compares different saliency maps for Deep Reinforcement Learning agents. This paper compares four perturbation-based approaches to create saliency maps for Deep Reinforcement Learning agents trained on four different Atari 2600 games. All four approaches work by perturbing parts of the input and measuring how much this affects the agent's output. The approaches are compared using three computational metrics: dependence on the learned parameters of the agent (sanity checks), faithfulness to the agent's reasoning (input degradation), and run-time. In particular, during the sanity checks we find issues with two approaches and propose a solution to fix one of those issues.
翻译:近些年来,在解释复杂的智能剂方面做了大量工作。一个例子是开发了几种算法,这些算法生成了显示每个像素在多大程度上归因于代理人的决定的显著图象。然而,对此类突出图的多数评价侧重于图像分类任务。据我们所知,没有一项工作彻底比较深强化学习剂的不同突出图象。本文比较了四种以扰动为基础的方法,为在四个不同的Atari 2600游戏中受过训练的深强化学习剂制作突出图象。所有四种方法都通过干扰部分输入和测量它对代理人产出的影响。三种计算指标比较了这些方法:依赖代理人的学习参数(卫生检查)、对代理人推理的忠诚性(生产退化)以及运行时间。特别是在理智检查过程中,我们发现两种方法的问题,并提出解决其中一种的方法。