The powerful learning ability of deep neural networks enables reinforcement learning agents to learn competent control policies directly from continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general reinforcement learning paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" and the collapse in performance. In this paper, we present IQ, i.e., interference-aware deep Q-learning, to mitigate catastrophic interference in single-task deep reinforcement learning. Specifically, we resort to online clustering to achieve on-the-fly context division, together with a multi-head network and a knowledge distillation regularization term for preserving the policy of learned contexts. Built upon deep Q networks, IQ consistently boosts the stability and performance when compared to existing methods, verified with extensive experiments on classic control and Atari tasks. The code is publicly available at: https://github.com/Sweety-dm/Interference-aware-Deep-Q-learning.
翻译:深神经网络的强大学习能力使强化学习机构能够直接从连续的环境中学习适当的控制政策。理论上,为了实现稳定的性能,神经网络假定了i.d. 投入,不幸的是,这在一般强化学习范式中并不具备,因为培训数据在时间上是关联的和非静止的。这个问题可能导致“灾难性干扰”现象和性能的崩溃。在本文中,我们介绍IQ,即干扰意识深度Q-学习,以减轻单项任务深度强化学习中的灾难性干扰。具体地说,我们利用在线集群实现在飞行环境中的分化,同时采用多头网络和知识蒸馏正规化术语来维护学习环境的政策。在深Q网络中,IQ不断提高稳定性和性能,与现有方法相比,经过对传统控制和阿塔里任务的广泛实验验证。代码公布在http://github.com/Swety-dm/Interfer-aware-deep-Q-learlear。