Dynamic quadruped locomotion over challenging terrains with precise foot placements is a hard problem for both optimal control methods and Reinforcement Learning (RL). Non-linear solvers can produce coordinated constraint satisfying motions, but often take too long to converge for online application. RL methods can learn dynamic reactive controllers but require carefully tuned shaping rewards to produce good gaits and can have trouble discovering precise coordinated movements. Imitation learning circumvents this problem and has been used with motion capture data to extract quadruped gaits for flat terrains. However, it would be costly to acquire motion capture data for a very large variety of terrains with height differences. In this work, we combine the advantages of trajectory optimization and learning methods and show that terrain adaptive controllers can be obtained by training policies to imitate trajectories that have been planned over procedural terrains by a non-linear solver. We show that the learned policies transfer to unseen terrains and can be fine-tuned to dynamically traverse challenging terrains that require precise foot placements and are very hard to solve with standard RL.
翻译:在具有精确脚姿的具有挑战性的地形上,动态四倍移动是最佳控制方法和强化学习(RL)的一个难题。非线性求解器可以产生协调的制约性满足动作,但往往要花太长时间才能聚集到网上应用中。 RL方法可以学习动态反应控制器,但需要仔细调整成形奖励才能产生良好的音轨,并可能难以发现精确的协调动作。 模拟学习绕过这一问题,并用运动抓取数据来为平坦地形提取四倍曲曲杆。 但是,要为高度差异的非常多的地形获取运动抓取数据成本很高。 在这项工作中,我们将轨迹优化和学习方法的优势结合起来,并表明通过培训政策可以取得地形适应控制器,以模拟非线性解答器在程序地形上规划的轨迹。 我们显示,所学的政策转移到看不见的地形,可以精确地步位置,很难用标准RL解决的动态反向地形。