Hierarchical learning has been successful at learning generalizable locomotion skills on walking robots in a sample-efficient manner. However, the low-dimensional "latent" action used to communicate between two layers of the hierarchy is typically user-designed. In this work, we present a fully-learned hierarchical framework, that is capable of jointly learning the low-level controller and the high-level latent action space. Once this latent space is learned, we plan over continuous latent actions in a model-predictive control fashion, using a learned high-level dynamics model. This framework generalizes to multiple robots, and we present results on a Daisy hexapod simulation, A1 quadruped simulation, and Daisy robot hardware. We compare a range of learned hierarchical approaches from literature, and show that our framework outperforms baselines on multiple tasks and two simulations. In addition to learning approaches, we also compare to inverse-kinematics (IK) acting on desired robot motion, and show that our fully-learned framework outperforms IK in adverse settings on both A1 and Daisy simulations. On hardware, we show the Daisy hexapod achieve multiple locomotion tasks, in an unstructured outdoor setting, with only 2000 hardware samples, reinforcing the robustness and sample-efficiency of our approach.
翻译:定级学习成功地学习了行走机器人的通用移动技能。 然而, 用于在两层等级之间进行交流的低维“ 相对” 动作通常都是由用户设计的。 在这项工作中, 我们提出了一个完全吸收的等级框架, 能够共同学习低层控制器和高层潜伏行动空间。 一旦学习了这一潜伏空间, 我们用一个高水平的智能动态模型, 计划以模型预知性控制方式, 以模型预知性控制方式持续潜伏行动。 这个框架向多个机器人普及, 并且我们展示了Disa 六极模拟、 A1 4 4 模拟和 Daisy 机器人硬件的结果。 我们比较了从文献中学习的一系列等级方法, 并展示了我们的框架在多个任务和两个模拟中超越了基线。 除了学习方法之外, 我们还比较了根据理想的机器人运动动作, 并显示我们完全学习的框架在 A1 和 Disaimi 模拟的不利环境中的 IK, 我们展示了一种坚固的深度的深度模型方法。 在硬件上, 我们展示了2000, 我们的深度的深度结构中, 我们展示了一种坚固的深度的深度的深度的 。