Recent years saw a plethora of work on explaining complex intelligent agents. One example is the development of several algorithms that generate saliency maps which show how much each pixel attributed to the agents' decision. However, most evaluations of such saliency maps focus on image classification tasks. As far as we know, there is no work which thoroughly compares different saliency maps for Deep Reinforcement Learning agents. This paper compares four perturbation-based approaches to create saliency maps for Deep Reinforcement Learning agents trained on four different Atari 2600 games. All four approaches work by perturbing parts of the input and measuring how much this affects the agent's output. The approaches are compared using three computational metrics: dependence on the learned parameters of the agent (sanity checks), faithfulness to the agent's reasoning (input degradation), and run-time.
翻译:近些年来,在解释复杂的智能剂方面做了大量工作。一个例子是开发了几种算法,这些算法生成了突出的地图,显示每个像素在多大程度上归因于代理人的决定。然而,对此类突出象素的大多数评价都侧重于图像分类任务。据我们所知,没有一项工作对深强化学习剂的不同突出地图进行彻底比较。本文比较了四种以扰动为基础的方法,为在四个不同的Atari 2600游戏中接受培训的深强化学习剂绘制突出的地图。所有四种方法都通过破坏部分投入并测量它对代理人产出的影响程度。这些方法用三种计算指标进行比较:依赖代理人的学习参数(卫生检查)、对代理人推理的忠诚性(投入退化)以及运行时间。