In this paper, we propose a novel framework for synthesizing a single multimodal control policy capable of generating diverse behaviors (or modes) and emergent inherent transition maneuvers for bipedal locomotion. In our method, we first learn efficient latent encodings for each behavior by training an autoencoder from a dataset of rough reference motions. These latent encodings are used as commands to train a multimodal policy through an adaptive sampling of modes and transitions to ensure consistent performance across different behaviors. We validate the policy performance in simulation for various distinct locomotion modes such as walking, leaping, jumping on a block, standing idle, and all possible combinations of inter-mode transitions. Finally, we integrate a task-based planner to rapidly generate open-loop mode plans for the trained multimodal policy to solve high-level tasks like reaching a goal position on a challenging terrain. Complex parkour-like motions by smoothly combining the discrete locomotion modes were generated in 3 min. to traverse tracks with a gap of width 0.45 m, a plateau of height 0.2 m, and a block of height 0.4 m, which are all significant compared to the dimensions of our mini-biped platform.
翻译:在本文中,我们提出了一个新颖的框架,用于综合单一多式联运控制政策,能够产生多种行为(或模式)和突发的两极运动的固有过渡动作。在我们的方法中,我们首先通过从粗略参考动作的数据集中培训自动编码器来学习每种行为的高效潜伏编码。这些潜伏编码被用作指令,通过对模式和过渡进行适应性抽样来培训多式联运政策,以确保不同行为的一致性。我们验证模拟各种不同移动模式的政策性能,例如行走、跳跃、跳跃、跳跃、在街块上跳跃、站立闲置和所有可能的跨模式的组合。最后,我们整合了一个基于任务的规划器,以快速生成经培训的多式联运政策的开放运行模式计划,解决高层次的任务,如在充满挑战的地形上达到目标位置。通过将离散式移动模式顺利地结合而出现的复杂园地运动在3分钟内产生,在模拟轨道上出现宽度为0.45米、高度0.2米高和高度0.4米方块之间的缺口,这与我们微型平台的不同层面相比都是重要的。</s>