Model-free deep reinforcement learning (RL) algorithms have been widely used for a range of complex control tasks. However, slow convergence and sample inefficiency remain challenging problems in RL, especially when handling continuous and high-dimensional state spaces. To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations. Specifically, we first explain how policy iteration can be applied directly with Anderson acceleration. Then we extend RAA to the case of deep RL by introducing a regularization term to control the impact of perturbation induced by function approximation errors. We further propose two strategies, i.e., progressive update and adaptive restart, to enhance the performance. The effectiveness of our method is evaluated on a variety of benchmark tasks, including Atari 2600 and MuJoCo. Experimental results show that our approach substantially improves both the learning speed and final performance of state-of-the-art deep RL algorithms.
翻译:在一系列复杂的控制任务中,广泛采用了不使用模型的深层强化学习算法(RL),然而,缓慢的趋同和抽样的低效率在RL中仍然是具有挑战性的问题,特别是在处理连续和高维状态空间时。为了解决这个问题,我们建议为无模型的、政策外的深层RL算法(RAA)制定一个通用加速法,该法是加速解决固定点扰动问题的有效办法。具体地说,我们首先解释政策迭代如何直接与Anderson加速一起适用。然后,我们将RAA扩大到深层RL的情况,引入一个正规化术语,以控制功能近似错误引起的扰动的影响。我们进一步提出了两项战略,即逐步更新和调整重新启动,以提高性能。我们的方法的有效性是在各种基准任务上进行评估,包括Atari 2600和MuJoCo。实验结果表明,我们的方法大大改进了最先进的深RL算法的学习速度和最后性能。