There is a growing interest in learning a velocity command tracking controller of quadruped robot using reinforcement learning due to its robustness and scalability. However, a single policy, trained end-to-end, usually shows a single gait regardless of the command velocity. This could be a suboptimal solution considering the existence of optimal gait according to the velocity for quadruped animals. In this work, we propose a hierarchical controller for quadruped robot that could generate multiple gaits (i.e. pace, trot, bound) while tracking velocity command. Our controller is composed of two policies, each working as a central pattern generator and local feedback controller, and trained with hierarchical reinforcement learning. Experiment results show 1) the existence of optimal gait for specific velocity range 2) the efficiency of our hierarchical controller compared to a controller composed of a single policy, which usually shows a single gait. Codes are publicly available.
翻译:使用强化学习来学习四重机器人的速度指令跟踪控制器的兴趣越来越大。 但是, 一项单项政策, 受过训练的端到端, 通常显示一个单行, 不论命令速度如何。 考虑到根据四重动物的速度存在最佳动作, 这可能是一个亚优的解决方案。 在这项工作中, 我们为四重机器人建议一个等级控制器, 它可以在跟踪速度指令的同时产生多个曲目( 即速度、 曲脚、 捆绑 ) 。 我们的控制器由两种政策组成, 每个政策都作为中央模式生成器和地方反馈控制器, 并经过等级强化学习培训。 实验结果显示:(1) 具体速度范围存在最佳操作 2, 我们的上层控制器相对于由单一策略组成的控制器的效率, 通常显示一个单行道。 代码是公开的 。