Actor learning and critic learning are two components of the outstanding and mostly used Deep Deterministic Policy Gradient (DDPG) reinforcement learning method. Since actor and critic learning plays a significant role in the overall robot's learning, the performance of the DDPG approach is relatively sensitive and unstable as a result. We propose a multi-actor-critic DDPG for reliable actor-critic learning to further enhance the performance and stability of DDPG. This multi-actor-critic DDPG is then integrated with Hindsight Experience Replay (HER) to form our new deep learning framework called AACHER. AACHER uses the average value of multiple actors or critics to substitute the single actor or critic in DDPG to increase resistance in the case when one actor or critic performs poorly. Numerous independent actors and critics can also gain knowledge from the environment more broadly. We implemented our proposed AACHER on goal-based environments: AuboReach, FetchReach-v1, FetchPush-v1, FetchSlide-v1, and FetchPickAndPlace-v1. For our experiments, we used various instances of actor/critic combinations, among which A10C10 and A20C20 were the best-performing combinations. Overall results show that AACHER outperforms the traditional algorithm (DDPG+HER) in all of the actor/critic number combinations that are used for evaluation. When used on FetchPickAndPlace-v1, the performance boost for A20C20 is as high as roughly 3.8 times the success rate in DDPG+HER.
翻译:动作学习和批评者学习是杰出且大多使用的深确定性政策强化学习方法的两个组成部分。 由于演员和批评者学习在整个机器人学习中起着重要作用, DDPG 方法的性能相对敏感且不稳定。 我们提议多动作- 批评性 DDPG 方法, 用于可靠的演员- 批评性学习, 以进一步提高 DDPG 的性能和稳定性。 这个多动作- 批评性 DDPG 与 Hindsight 经验再游戏(HER) 整合, 以形成我们称为 AACHER 的新的深层次学习框架。 AACHER 使用多个演员或批评者的平均性能来取代DDPG 的单一演员或批评者。 许多独立演员和批评者也可以更广泛地从环境中获取知识。 我们在基于目标的环境中实施了我们提议的 AACCHER : AuboReach, Fetchrereach-v1, FreetPush-V1, FreetSled-v1, 和 Flick-Place-Place-v1。 在我们的实验中,我们使用了各种AHR- CLE+C 10的AHR- dal- dal- dal- disal- dust- dust- 的组合中,我们使用了各种动作/C 和Axx- dust的AV1, 在ARC 的AVA- dis- dust 的A- disal- dow 的A- 和A- disal- disal- disal- dow 的演算中,我们用来用来用来展示式的A- dow 和A- dow 和A- dow 的演技法化的A- dow。