We employ Proximal Iteration for value-function optimization in deep reinforcement learning. Proximal Iteration is a computationally efficient technique that enables biasing the optimization procedure towards desirable solutions. As a concrete application, we endow the objective function of Deep Q-Network (DQN) and Rainbow agents with a proximal term to ensure robustness in presence of large noise. The resultant agents, which we call DQN Pro and Rainbow Pro, exhibit significant improvements over their original counterparts on the Atari benchmark. Our results accentuate the power of employing sound optimization techniques for deep reinforcement learning.
翻译:在深层强化学习中,我们采用准偏移法优化价值功能; 准偏移法是一种计算效率高的技术,使优化程序偏向于理想的解决方案。 作为一种具体应用,我们给深QNetwork (DQN) 和彩虹代理器的客观功能下了一个近似术语,以确保在出现大噪音时的稳健性。 由此产生的代理器(我们称之为DQN Pro 和彩虹Pro)比在阿塔里基准上的原始对应器有了显著的改进。 我们的结果凸显了运用完善优化技术进行深层强化学习的力量。