The use of deep neural networks as function approximators has led to striking progress for reinforcement learning algorithms and applications. Yet the knowledge we have on decision boundary geometry and the loss landscape of neural policies is still quite limited. In this paper we propose a framework to investigate the decision boundary and loss landscape similarities across states and across MDPs. We conduct experiments in various games from Arcade Learning Environment, and discover that high sensitivity directions for neural policies are correlated across MDPs. We argue that these high sensitivity directions support the hypothesis that non-robust features are shared across training environments of reinforcement learning agents. We believe our results reveal fundamental properties of the environments used in deep reinforcement learning training, and represent a tangible step towards building robust and reliable deep reinforcement learning agents.
翻译:利用深神经网络作为功能近似器,在强化学习算法和应用方面取得了惊人的进展。然而,我们掌握的关于决定边界几何学和丧失神经政策景观的知识仍然非常有限。在本文件中,我们提出了一个框架,调查决定边界和损失地貌在各州和不同MDP系统之间的相似性。我们在各种游戏中从Arcade学习环境进行实验,发现神经政策的高度敏感方向与MDP系统相互关联。我们争辩说,这些高度敏感方向支持这样的假设,即非紫外线特征在强化学习剂的培训环境中是共享的。我们相信,我们的结果揭示了用于深入强化学习培训的环境的基本特性,是朝着建设强大和可靠的强化深层学习剂迈出的切实步骤。