Teaching robots to learn diverse locomotion skills under complex three-dimensional environmental settings via Reinforcement Learning (RL) is still challenging. It has been shown that training agents in simple settings before moving them on to complex settings improves the training process, but so far only in the context of relatively simple locomotion skills. In this work, we adapt the Enhanced Paired Open-Ended Trailblazer (ePOET) approach to train more complex agents to walk efficiently on complex three-dimensional terrains. First, to generate more rugged and diverse three-dimensional training terrains with increasing complexity, we extend the Compositional Pattern Producing Networks - Neuroevolution of Augmenting Topologies (CPPN-NEAT) approach and include randomized shapes. Second, we combine ePOET with Soft Actor-Critic off-policy optimization, yielding ePOET-SAC, to ensure that the agent could learn more diverse skills to solve more challenging tasks. Our experimental results show that the newly generated three-dimensional terrains have sufficient diversity and complexity to guide learning, that ePOET successfully learns complex locomotion skills on these terrains, and that our proposed ePOET-SAC approach slightly improves upon ePOET.
翻译:通过强化学习(RL),在复杂的三维环境环境中教授机器人在复杂的三维环境环境中学习不同的移动技能,这仍然具有挑战性;已经表明,简单环境培训人员在将他们转移到复杂的环境环境中之前,在简单环境条件下培训人员改进了培训过程,但迄今为止只是在相对简单的移动技能背景下,才在相对简单的移动技能背景下进行。在这项工作中,我们调整了强化的Paired 开放性拖车(ePOET)方法,以训练更复杂的代理人在复杂的三维地形上高效行走。首先,为了产生更多复杂多样和多样化的三维培训地形,我们扩展了组成模式生成网络 -- -- 增强地形学的神经进化(CPPN-NEAT)方法,并纳入了随机化的形状。第二,我们把EPOET与Soft Acor-Critical 离政策优化(ePOET)结合起来,产生ePOET-SAC,以确保代理人能够学习更多样化的技能来解决更具挑战性的任务。我们的实验结果显示,新形成的三维地形地形具有足够的多样性和复杂性来指导学习,EPOET成功地学习了我们在这些地形上的微变动。