A popular paradigm in robotic learning is to train a policy from scratch for every new robot. This is not only inefficient but also often impractical for complex robots. In this work, we consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. In this paper, we propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator. We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters. An expert policy on the source robot is transferred through training on a sequence of intermediate robots that gradually evolve into the target robot. Experiments show that the proposed continuous evolutionary model can effectively transfer the policy across robots and achieve superior sample efficiency on new robots using a physics simulator. The proposed method is especially advantageous in sparse reward settings where exploration can be significantly reduced.
翻译:机器人学习的流行范例是从零开始为每个新机器人培训一项政策。 这不仅效率低,而且复杂机器人也往往不切实际。 在这项工作中,我们考虑将政策转让给两个不同机器人的问题,两个不同的机器人,其参数差别很大,如运动学和形态学。 现有的方法,通过匹配动作或状态过渡分布来培训新政策,包括模仿学习方法,由于最佳行动和/或国家分布在不同机器人中不匹配而失败。 在本文中,我们提出了一个名为“ $REvolveR$”的新方法,用于在物理学模拟器中实施机器人政策转移的连续进化模型。 我们通过寻找机器人参数的连续进化变化,在源机器人和目标机器人之间进行互换。 关于源机器人的专家政策是通过培训来转让的,即通过逐步演变成目标机器人的中间机器人序列。 实验表明,拟议的连续进化模型可以有效地将政策转让给机器人,并实现使用物理模拟器的新机器人的高级样本效率。 所拟议的方法在可以大大减少勘探的稀稀有报酬环境中特别有利。