Learned locomotion policies can rapidly adapt to diverse environments similar to those experienced during training but lack a mechanism for fast tuning when they fail in an out-of-distribution test environment. This necessitates a slow and iterative cycle of reward and environment redesign to achieve good performance on a new task. As an alternative, we propose learning a single policy that encodes a structured family of locomotion strategies that solve training tasks in different ways, resulting in Multiplicity of Behavior (MoB). Different strategies generalize differently and can be chosen in real-time for new tasks or environments, bypassing the need for time-consuming retraining. We release a fast, robust open-source MoB locomotion controller, Walk These Ways, that can execute diverse gaits with variable footswing, posture, and speed, unlocking diverse downstream tasks: crouching, hopping, high-speed running, stair traversal, bracing against shoves, rhythmic dance, and more. Video and code release: https://gmargo11.github.io/walk-these-ways/
翻译:学习运动政策可以迅速适应类似于培训期间所经历的不同环境,但缺乏一种机制,在培训过程中无法在分配外测试环境中成功时可以快速调整,这需要一种缓慢和反复的奖励和环境重新设计循环,以在新的任务中取得良好业绩。作为一种替代办法,我们建议学习一种单一的政策,将结构化的移动战略组合成一个结构化的移动战略,以不同的方式解决培训任务,从而导致行为(行为)的多重性。不同的战略是不同的,可以实时选择新的任务或环境,而不必花时间进行再培训。我们发布了一个快速、强有力的开放源的移动控制器,可以用不同的脚步、姿势和速度执行不同的曲子,解开不同的下游任务:弯曲、购物、高速运行、楼梯曲、推车、节奏舞蹈等等。视频和代码发布:https://gmargo11.github.io/walk-the-ways/