This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world. We present a multi-task reinforcement learning framework to train the robot to accomplish a large variety of jumping tasks, such as jumping to different locations and directions. To improve performance on these challenging tasks, we develop a new policy structure that encodes the robot's long-term input/output (I/O) history while also providing direct access to its short-term I/O history. In order to train a versatile multi-task policy, we utilize a multi-stage training scheme that includes different training stages for different objectives. After multi-stage training, the multi-task policy can be directly transferred to Cassie, a physical bipedal robot. Training on different tasks and exploring more diverse scenarios leads to highly robust policies that can exploit the diverse set of learned skills to recover from perturbations or poor landings during real-world deployment. Such robustness in the proposed multi-task policy enables Cassie to succeed in completing a variety of challenging jump tasks in the real world, such as standing long jumps, jumping onto elevated platforms, and multi-axis jumps.
翻译:这项工作的目的是通过使一个由力器控制的双胞胎机器人能够在现实世界中进行强大和多功能的动态跳跃来推动双球机器人灵活性的极限。 我们提出了一个多任务强化学习框架,以训练机器人完成各种跳跃任务,例如跳跃到不同地点和方向。 为了改进这些具有挑战性的任务的绩效,我们制定了一个新的政策结构,将机器人的长期投入/产出(I/O)历史编码起来,同时提供直接进入其短期I/O历史的渠道。为了培训一个多功能的多任务政策,我们利用一个多阶段培训计划,其中包括不同目标的不同培训阶段。经过多阶段培训后,多任务政策可以直接转移到Cassie,一个实体双行机器人。关于不同任务和探索更多样化的情景的培训,可以形成一个高度有力的政策,利用各种学习技能从扰动或低着陆中恢复过来。 拟议的多任务政策中的这种坚固性能使得卡西能够成功地完成各种具有挑战性的跳跃式的跳跃式平台。