In order to efficiently learn a dynamics model for a task in a new environment, one can adapt a model learned in a similar source environment. However, existing adaptation methods can fail when the target dataset contains transitions where the dynamics are very different from the source environment. For example, the source environment dynamics could be of a rope manipulated in free-space, whereas the target dynamics could involve collisions and deformation on obstacles. Our key insight is to improve data efficiency by focusing model adaptation on only the regions where the source and target dynamics are similar. In the rope example, adapting the free-space dynamics requires significantly fewer data than adapting the free-space dynamics while also learning collision dynamics. We propose a new method for adaptation that is effective in adapting to regions of similar dynamics. Additionally, we combine this adaptation method with prior work on planning with unreliable dynamics to make a method for data-efficient online adaptation, called FOCUS. We first demonstrate that the proposed adaptation method achieves statistically significantly lower prediction error in regions of similar dynamics on simulated rope manipulation and plant watering tasks. We then show on a bimanual rope manipulation task that FOCUS achieves data-efficient online learning, in simulation and in the real world.
翻译:为了在新的环境中有效学习任务动态模型,人们可以对在类似源环境中学习到的模型进行适应。但是,当目标数据集包含动态与源环境非常不同的转型时,现有的适应方法可能会失败。例如,源环境动态可能是在自由空间操纵的绳索,而目标动态则可能涉及碰撞和障碍变形。我们的关键洞察力是提高数据效率,将模型适应仅仅侧重于源和目标动态相似的区域。在绳子实例中,调整自由空间动态所需要的数据远远少于在学习碰撞动态的同时适应自由空间动态所需要的数据。我们提出了一种新的适应方法,在适应类似动态区域方面是有效的。此外,我们把这一适应方法与先前的不可靠的动态规划工作结合起来,以便形成一种数据高效在线适应的方法,称为FOCUS。我们首先表明,拟议的适应方法在模拟绳子操纵和植物供水任务方面,在类似动态区域实现了统计上明显较低的预测错误。我们随后展示了一种双性绳索操纵任务,即FOCUS在模拟和现实世界中实现数据高效的在线学习。