Deep Reinforcement Learning (DRL) connects the classic Reinforcement Learning algorithms with Deep Neural Networks. A problem in DRL is that CNNs are black-boxes and it is hard to understand the decision-making process of agents. In order to be able to use RL agents in highly dangerous environments for humans and machines, the developer needs a debugging tool to assure that the agent does what is expected. Currently, rewards are primarily used to interpret how well an agent is learning. However, this can lead to deceptive conclusions if the agent receives more rewards by memorizing a policy and not learning to respond to the environment. In this work, it is shown that this problem can be recognized with the help of gradient visualization techniques. This work brings some of the best-known visualization methods from the field of image classification to the area of Deep Reinforcement Learning. Furthermore, two new visualization techniques have been developed, one of which provides particularly good results. It is being proven to what extent the algorithms can be used in the area of Reinforcement learning. Also, the question arises on how well the DRL algorithms can be visualized across different environments with varying visualization techniques.
翻译:深强化学习( DRL) 将经典的强化学习算法与深神经网络连接起来。 DRL 的一个问题是CNN是黑箱,很难理解代理人的决策过程。 为了能够在对人类和机器极为危险的环境中使用RL代理器, 开发者需要一个调试工具来确保代理器能达到预期效果。 目前, 奖励主要用来解释代理器学习的好坏。 但是, 如果代理器通过记忆某项政策而获得更多的奖赏而不学习对环境的反应, 这可能导致欺骗性的结论。 在这项工作中, 显示这个问题可以通过梯度可视化技术的帮助而得到承认。 这项工作将一些最著名的图像分类方法带给深强化学习领域。 此外, 开发了两种新的视觉化技术, 其中一种技术提供了特别好的结果。 正在证明在加强学习领域, 算法可以在多大程度上用于强化学习。 另外, 问题是如何在视觉化技术的不同环境中, DRL 如何很好地视觉化。