The efficacy of model-free learning for robot control relies on the tailored integration of task-specific priors and heuristics, hence calling for a unified approach. In this paper, we define a general class for priors called oracles and propose bounding the permissible state around the oracle's ansatz, resulting in task-agnostic oracle-guided policy optimization. Additionally, to enhance modularity, we introduce the notion of task-vital modes. A policy mastering a compact set of modes and intermediate transitions can then solve perpetual tasks. The proposed approach is validated in challenging biped control tasks: parkour and diving on a 16-DoF dynamic bipedal robot, Hector. OGMP results in a single policy per task, solving indefinite parkour over diverse tracks and omnidirectional diving from varied heights, exhibiting versatile agility. Finally, we introduce a novel latent mode space reachability analysis to study our policy's mode generalization by computing a feasible mode set function through which we certify a set of failure-free modes for our policy to perform at any given state.
翻译:暂无翻译