Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with updates that incentivize the online network to remain in the proximity of the target network. This improves the robustness of deep reinforcement learning in presence of noisy updates. The resultant agents, called DQN Pro and Rainbow Pro, exhibit significant performance improvements over their original counterparts on the Atari benchmark demonstrating the effectiveness of this simple idea in deep reinforcement learning. The code for our paper is available here: Github.com/amazon-research/fast-rl-with-slow-updates.
翻译:深度增强学习算法通常使用两个网络来进行值函数优化:一个在线网络和一个以一定延迟跟踪在线网络的目标网络。使用两个独立的网络使智能体能够对抗在引导式训练期间出现的问题。在本文中,我们继承了两个广受欢迎的深度增强学习算法——DQN和Rainbow,并通过更新方法奖励在线网络保持在目标网络的附近,从而提高了深度增强学习在存在噪声更新的情况下的健壮性。这种更新方法是围绕一个简单的想法展开的,即使用更慢的更新策略来训练智能体。因此,经过我们的改进,DQN Pro和Rainbow Pro在Atari基准测试中表现出显着的性能提升,证明了这种方法在深度增强学习中的有效性。我们的代码可在GitHub上找到:Github.com/amazon-research/fast-rl-with-slow-updates.