Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable saliency maps for RL policies. In this work, we propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies that achieve both high interpretability and computation efficiency in generating saliency maps. Our approach is also found to improve the robustness of RL policies to multiple adversarial attacks. We conduct experiments on three tasks, MiniGrid (Fetch Object), Atari (Breakout) and CARLA Autonomous Driving, to demonstrate the importance and effectiveness of our approach.
翻译:虽然深强化学习(RL)在一系列广泛任务中证明是成功的,但它面临的一个挑战是在应用到现实世界的问题时可解释性; 精度地图经常被用来为深神经网络提供解释性; 然而,在RL领域,现有的突出度地图方法要么计算成本昂贵,因而无法满足现实世界情景的实时要求,要么无法为RL政策制作可解释的突出地图; 在这项工作中,我们建议采用有选择的输入加速常规化(DIGR)的蒸馏方法,利用政策蒸馏和输入梯度规范化,来制定新的政策,既能实现高可解释性,又能计算生成突出度地图的效率; 我们还发现,我们的方法是提高RL政策对多重对抗性攻击的稳健性; 我们在MiniGrid(牵引物体)、Atari(Breakout)和CARLA自主驾驶三项任务上进行实验,以展示我们的方法的重要性和有效性。