A popular paradigm in robotic learning is to train a policy from scratch for every new robot. This is not only inefficient but also often impractical for complex robots. In this work, we consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. In this paper, we propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator. We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters. An expert policy on the source robot is transferred through training on a sequence of intermediate robots that gradually evolve into the target robot. Experiments on a physics simulator show that the proposed continuous evolutionary model can effectively transfer the policy across robots and achieve superior sample efficiency on new robots. The proposed method is especially advantageous in sparse reward settings where exploration can be significantly reduced. Code is released at https://github.com/xingyul/revolver.
翻译:机器人学习的流行范例是从零开始为每个新机器人培训一项政策。 这不仅效率低,而且对复杂的机器人也往往不切实际。 在这项工作中,我们考虑将政策转让给两个不同机器人的问题,两个不同的机器人,其参数差别很大,例如运动学和形态学。 现有的方法,通过匹配动作或状态过渡分布来培训新政策,包括模仿学习方法,由于最佳行动和/或国家分布在不同机器人中不匹配而失败。 在本文中,我们提议了一种名为 $REvolveR$ 的新方法, 用于在物理学模拟器中实施机器人政策转移的连续进化模型。 我们通过寻找机器人参数的持续进化变化,在源机器人和目标机器人之间进行干涉。 关于源机器人的专家政策是通过培训来转让的,中间机器人序列逐渐演变成目标机器人。 对物理模拟器的实验显示, 拟议的连续进化模型可以有效地将政策转移给跨机器人,并实现新机器人的高级样本效率。 拟议的方法在稀薄的奖励环境中特别有利, 探索可以大大减少。 代码将在 https://grethubum/com 发布于 httpsvolvolvol.