Deep reinforcement learning (RL) based controllers for legged robots have demonstrated impressive robustness for walking in different environments for several robot platforms. To enable the application of RL policies for humanoid robots in real-world settings, it is crucial to build a system that can achieve robust walking in any direction, on 2D and 3D terrains, and be controllable by a user-command. In this paper, we tackle this problem by learning a policy to follow a given step sequence. The policy is trained with the help of a set of procedurally generated step sequences (also called footstep plans). We show that simply feeding the upcoming 2 steps to the policy is sufficient to achieve omnidirectional walking, turning in place, standing, and climbing stairs. Our method employs curriculum learning on the complexity of terrains, and circumvents the need for reference motions or pre-trained weights. We demonstrate the application of our proposed method to learn RL policies for 2 new robot platforms - HRP5P and JVRC-1 - in the MuJoCo simulation environment. The code for training and evaluation is available online.
翻译:以深度加固控制器为基础对腿形机器人进行深度加固学习(RL), 显示在多个机器人平台的不同环境中行走的强度令人印象深刻。 为了能够在现实世界环境中对人形机器人应用RL政策, 关键是要在2D和3D地形上建立一个能够在任何方向上实现稳健行走的系统, 并可由用户指令控制。 在本文中, 我们通过学习一项遵循特定步骤序列的政策来解决这个问题。 该政策在一系列程序产生的步骤序列( 也称为脚步计划)的帮助下得到了培训。 我们显示, 仅仅将即将到来的2个步骤注入该政策就足以实现全天线行走、 转动、 站立和爬楼梯。 我们的方法是在地形复杂度上学习课程, 并绕过对参考动作或预先训练重量的需要。 我们展示了在 MuJoco 模拟环境中为2个新的机器人平台- HRP5P 和 JVRC-1- 学习RL政策的拟议方法的应用。 培训和评价代码可以在线查阅 。