Reliable bipedal walking over complex terrain is a challenging problem, using a curriculum can help learning. Curriculum learning is the idea of starting with an achievable version of a task and increasing the difficulty as a success criteria is met. We propose a 3-stage curriculum to train Deep Reinforcement Learning policies for bipedal walking over various challenging terrains. In the first stage, the agent starts on an easy terrain and the terrain difficulty is gradually increased, while forces derived from a target policy are applied to the robot joints and the base. In the second stage, the guiding forces are gradually reduced to zero. Finally, in the third stage, random perturbations with increasing magnitude are applied to the robot base, so the robustness of the policies are improved. In simulation experiments, we show that our approach is effective in learning walking policies, separate from each other, for five terrain types: flat, hurdles, gaps, stairs, and steps. Moreover, we demonstrate that in the absence of human demonstrations, a simple hand designed walking trajectory is a sufficient prior to learn to traverse complex terrain types. In ablation studies, we show that taking out any one of the three stages of the curriculum degrades the learning performance.
翻译:在复杂的地形上,可靠的双足行走是一个棘手的问题,使用课程可以帮助学习。课程学习的构想是,从一个可以实现的任务版本开始,随着成功标准的达到而增加难度。我们提议了一个三阶段课程,以训练深强化学习政策,使两足行走在各种具有挑战性的地形上。在第一阶段,该物剂从一个容易的地形开始,地形困难逐渐增加,而来自目标政策的力量则适用于机器人接合点和基地。在第二阶段,指导力逐渐下降到零。最后,在第三阶段,对机器人基地应用随机的扰动,其规模越来越大,因此政策的健全性得到改进。在模拟实验中,我们表明我们的方法在学习步行政策方面是有效的,对五类地形是分开的:平坦、障碍、缺口、楼梯和步骤。此外,我们证明在没有人类演示的情况下,设计简单的行走轨迹在学习复杂的地形类型之前就足够了。在模拟研究中,我们显示从三个阶段中的任何一个阶段学习都能够使学习成绩退化。