For legged robots to match the athletic capabilities of humans and animals, they must not only produce robust periodic walking and running, but also seamlessly switch between nominal locomotion gaits and more specialized transient maneuvers. Despite recent advancements in controls of bipedal robots, there has been little focus on producing highly dynamic behaviors. Recent work utilizing reinforcement learning to produce policies for control of legged robots have demonstrated success in producing robust walking behaviors. However, these learned policies have difficulty expressing a multitude of different behaviors on a single network. Inspired by conventional optimization-based control techniques for legged robots, this work applies a recurrent policy to execute four-step, 90 degree turns trained using reference data generated from optimized single rigid body model trajectories. We present a novel training framework using epilogue terminal rewards for learning specific behaviors from pre-computed trajectory data and demonstrate a successful transfer to hardware on the bipedal robot Cassie.
翻译:脚机械人要与人类和动物的运动能力相匹配,它们不仅必须产生稳健的周期性步行和跑步,而且必须无缝地在名义移动动作和更加专业化的瞬间动作之间转换。尽管最近两肢机器人的控制有所进展,但很少注重产生高度动态的行为。最近利用强化学习来制定控制脚机械人的政策的工作在产生稳健的行走行为方面取得了成功。然而,这些学习的政策很难在单一网络上表达多种不同的行为。在对脚机械人的常规优化控制技术的启发下,这项工作运用一项经常性政策,使用优化的单体型硬体模型轨迹生成的参考数据,执行四步90度的旋转训练。我们提出了一个新的培训框架,利用上层终端奖励从预编造轨迹数据中学习特定行为,并展示两肢机器人机器人的硬件成功转移。