Recent work has demonstrated the success of reinforcement learning (RL) for training bipedal locomotion policies for real robots. This prior work, however, has focused on learning joint-coordination controllers based on an objective of following joint trajectories produced by already available controllers. As such, it is difficult to train these approaches to achieve higher-level goals of legged locomotion, such as simply specifying the desired end-effector foot movement or ground reaction forces. In this work, we propose an approach for integrating knowledge of the robot system into RL to allow for learning at the level of task space actions in terms of feet setpoints. In particular, we integrate learning a task space policy with a model-based inverse dynamics controller, which translates task space actions into joint-level controls. With this natural action space for learning locomotion, the approach is more sample efficient and produces desired task space dynamics compared to learning purely joint space actions. We demonstrate the approach in simulation and also show that the learned policies are able to transfer to the real bipedal robot Cassie. This result encourages further research towards incorporating bipedal control techniques into the structure of the learning process to enable dynamic behaviors.
翻译:最近的工作表明,在为真正的机器人培训双翼移动政策方面,强化学习(RL)是成功的。然而,先前的这项工作侧重于学习基于跟踪现有控制器所制作的联合轨迹的目标的联合协调控制器。因此,很难培训这些方法以实现腿动动的更高层次目标,例如简单地说明所希望的终端效应或脚动或地面反应力。在这项工作中,我们提出了将机器人系统知识纳入机器人系统的方法,以便能够在任务空间行动水平上学习脚定点。特别是,我们将任务空间政策与基于模型的反动控制器结合起来学习,将任务空间行动转化为联合控制。有了这种学习移动的自然行动空间,这一方法比较高效,产生所需的任务空间动态,而只是学习纯粹的联合空间行动。我们在模拟中展示了这一方法,还表明所学习的政策能够转移到真正的双翼机器人卡西。这鼓励进一步研究将双向控制技术纳入动态学习过程的结构。