转录:机器人对机器人政策转让的持续演进模式 (REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer)

A popular paradigm in robotic learning is to train a policy from scratch for every new robot. This is not only inefficient but also often impractical for complex robots. In this work, we consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. In this paper, we propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator. We interpolate between the source robot and the target robot by finding a continuous evolutionary change of robot parameters. An expert policy on the source robot is transferred through training on a sequence of intermediate robots that gradually evolve into the target robot. Experiments show that the proposed continuous evolutionary model can effectively transfer the policy across robots and achieve superior sample efficiency on new robots using a physics simulator. The proposed method is especially advantageous in sparse reward settings where exploration can be significantly reduced.

翻译：机器人学习的流行范例是从零开始为每个新机器人培训一项政策。这不仅效率低,而且复杂机器人也往往不切实际。在这项工作中,我们考虑将政策转让给两个不同机器人的问题,两个不同的机器人,其参数差别很大,如运动学和形态学。现有的方法,通过匹配动作或状态过渡分布来培训新政策,包括模仿学习方法,由于最佳行动和/或国家分布在不同机器人中不匹配而失败。在本文中,我们提出了一个名为“ $REvolveR$”的新方法,用于在物理学模拟器中实施机器人政策转移的连续进化模型。我们通过寻找机器人参数的连续进化变化,在源机器人和目标机器人之间进行互换。关于源机器人的专家政策是通过培训来转让的,即通过逐步演变成目标机器人的中间机器人序列。实验表明,拟议的连续进化模型可以有效地将政策转让给机器人,并实现使用物理模拟器的新机器人的高级样本效率。所拟议的方法在可以大大减少勘探的稀稀有报酬环境中特别有利。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日