In this paper, we describe an approach to achieve dynamic legged locomotion on physical robots which combines existing methods for control with reinforcement learning. Specifically, our goal is a control hierarchy in which highest-level behaviors are planned through reduced-order models, which describe the fundamental physics of legged locomotion, and lower level controllers utilize a learned policy that can bridge the gap between the idealized, simple model and the complex, full order robot. The high-level planner can use a model of the environment and be task specific, while the low-level learned controller can execute a wide range of motions so that it applies to many different tasks. In this letter we describe this learned dynamic walking controller and show that a range of walking motions from reduced-order models can be used as the command and primary training signal for learned policies. The resulting policies do not attempt to naively track the motion (as a traditional trajectory tracking controller would) but instead balance immediate motion tracking with long term stability. The resulting controller is demonstrated on a human scale, unconstrained, untethered bipedal robot at speeds up to 1.2 m/s. This letter builds the foundation of a generic, dynamic learned walking controller that can be applied to many different tasks.
翻译:在本文中, 我们描述一种在物理机器人上实现动态腿动动的方法, 将现有控制方法与强化学习相结合。 具体地说, 我们的目标是一个控制等级, 通过减序模型来规划最高层次的行为, 描述脚动运动的基本物理, 低级别控制器使用一个学习的政策, 能够弥合理想化、 简单模型和复杂、 完整顺序机器人之间的差距。 高级规划器可以使用环境模型, 并且任务具体化, 而低级别学习的控制器可以执行一系列广泛的动作, 从而应用到许多不同的任务 。 在此信里, 我们描述这个学习过的动态控制器, 并显示一系列从减序模型的行走动作可以用作学习政策的命令和初级培训信号 。 由此产生的政策不会试图天真地跟踪运动( 传统轨迹控制器会这样做 ), 而是平衡即时运动与长期稳定性。 由此产生的控制器可以在人类规模上展示, 不受约束的、 未交错的双向型机器人, 速度可以应用到 1.2 m/ s. 。 此信可以建立许多通用的动态机动控制器的基础 。