Multi-goal reinforcement learning (RL) aims to qualify the agent to accomplish multi-goal tasks, which is of great importance in learning scalable robotic manipulation skills. However, reward engineering always requires strenuous efforts in multi-goal RL. Moreover, it will introduce inevitable bias causing the suboptimality of the final policy. The sparse reward provides a simple yet efficient way to overcome such limits. Nevertheless, it harms the exploration efficiency and even hinders the policy from convergence. In this paper, we propose a density-based curriculum learning method for efficient exploration with sparse rewards and better generalization to desired goal distribution. Intuitively, our method encourages the robot to gradually broaden the frontier of its ability along the directions to cover the entire desired goal space as much and quickly as possible. To further improve data efficiency and generality, we augment the goals and transitions within the allowed region during training. Finally, We evaluate our method on diversified variants of benchmark manipulation tasks that are challenging for existing methods. Empirical results show that our method outperforms the state-of-the-art baselines in terms of both data efficiency and success rate.
翻译:多目标强化学习(RL)旨在让该代理人有资格完成多目标任务,这对学习可扩缩的机器人操纵技能非常重要。然而,奖励工程总是需要在多目标操纵技能方面作出艰苦努力。此外,它将带来不可避免的偏差,导致最终政策的不优化。这种微弱的奖励提供了克服这些限制的简单而有效的方法。然而,它损害了勘探效率,甚至阻碍了政策趋同。在本文件中,我们提出了一种基于密度的课程学习方法,以便有效地探索,奖赏很少,并且更全面地推广到预期的目标分布。我们的方法直观地鼓励机器人逐步扩大其能力范围,尽可能迅速覆盖整个预期目标空间。为了进一步提高数据效率和普遍性,我们扩大了培训期间允许区域内部的目标和过渡。最后,我们评估了我们关于现有方法所面临挑战的基准操纵任务多样化的变式的方法。经验显示,我们的方法在数据效率和成功率方面都超过了最先进的基准。