Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take the advantage of the both methods for better exploration and exploitation.The evolutionary part in these hybrid methods maintains a population of policy networks.However, existing methods focus on optimizing the parameters of policy network, which is usually high-dimensional and tricky for EA.In this paper, we shift the target of evolution from high-dimensional parameter space to low-dimensional action space.We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel hybrid method of EA and DRL.In EAS, we focus on optimizing the action chosen by the policy network and attempt to obtain high-quality actions to promote policy learning through an evolutionary algorithm. We conduct several experiments on challenging continuous control tasks.The result shows that EAS-TD3 shows superior performance over other state-of-art methods.
翻译:这些混合方法的进化部分维持着政策网络的总数。 然而,现有的方法侧重于优化政策网络的参数,而政策网络通常具有高度和难度。 在本文件中,我们把进化目标从高维参数空间转向低维行动空间。 我们提出了进化行动选择-延迟深确定政策梯度(EAS-TD3),这是EA和DRL的新型混合方法。 在EAS中,我们侧重于优化政策网络选择的行动,并试图通过进化算法获得促进政策学习的高质量行动。 我们在挑战持续控制任务方面进行了几项实验。 研究结果表明,EAS-TD3显示,EAS-TD3的表现优于其他最先进的方法。