Deep reinforcement learning policies, despite their outstanding efficiency in simulated visual control tasks, have shown disappointing ability to generalize across disturbances in the input training images. Changes in image statistics or distracting background elements are pitfalls that prevent generalization and real-world applicability of such control policies. We elaborate on the intuition that a good visual policy should be able to identify which pixels are important for its decision, and preserve this identification of important sources of information across images. This implies that training of a policy with small generalization gap should focus on such important pixels and ignore the others. This leads to the introduction of saliency-guided Q-networks (SGQN), a generic method for visual reinforcement learning, that is compatible with any value function learning method. SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing stateof-the-art methods on the Deepmind Control Generalization benchmark, setting a new reference in terms of training efficiency, generalization gap, and policy interpretability.
翻译:深度强化学习政策尽管在模拟视觉控制任务方面非常有效,但显示出在投入培训图像的干扰中全面推广的能力令人失望。图像统计的变化或转移注意力的背景要素是防止这种控制政策普遍化和实际应用的隐患。我们详细阐述了这样的直觉,即良好的视觉政策应该能够确定哪些像素对其决定很重要,并保持这种对各种图像的重要信息来源的识别。这意味着,具有小范围概括化差距的政策培训应当侧重于如此重要的像素,而忽略其他。这导致引入了突出的以系统指导的Q网络(SGQN),这是视觉强化学习的一种通用方法,与任何价值函数学习方法相容。SGQN大大改进了Soft Actor-Critic 剂的概括化能力,并超越了现有关于深分化控制通用基准的先进方法,在培训效率、概括化差距和政策解释性方面确立了新的参考。