Reinforcement learning (RL) can in principle let robots automatically adapt to new tasks, but current RL methods require a large number of trials to accomplish this. In this paper, we tackle rapid adaptation to new tasks through the framework of meta-learning, which utilizes past tasks to learn to adapt with a specific focus on industrial insertion tasks. Fast adaptation is crucial because prohibitively large number of on-robot trials will potentially damage hardware pieces. Additionally, effective adaptation is also feasible in that experience among different insertion applications can be largely leveraged by each other. In this setting, we address two specific challenges when applying meta-learning. First, conventional meta-RL algorithms require lengthy online meta-training. We show that this can be replaced with appropriately chosen offline data, resulting in an offline meta-RL method that only requires demonstrations and trials from each of the prior tasks, without the need to run costly meta-RL procedures online. Second, meta-RL methods can fail to generalize to new tasks that are too different from those seen at meta-training time, which poses a particular challenge in industrial applications, where high success rates are critical. We address this by combining contextual meta-learning with direct online finetuning: if the new task is similar to those seen in the prior data, then the contextual meta-learner adapts immediately, and if it is too different, it gradually adapts through finetuning. We show that our approach is able to quickly adapt to a variety of different insertion tasks, with a success rate of 100% using only a fraction of the samples needed for learning the tasks from scratch. Experiment videos and details are available at https://sites.google.com/view/offline-metarl-insertion.
翻译:强化学习( RL) 原则上可以让机器人自动适应新任务, 但当前的 RL 方法需要大量测试才能完成。 在本文中, 我们通过元学习框架解决快速适应新任务的问题。 元学习框架利用过去的任务来学习适应, 并具体侧重于工业插入任务。 快速适应至关重要, 因为大量在机器人上进行大量在机器人上试验会破坏硬件。 此外, 有效的适应在不同的插入应用程序中的经验也可以被更多地相互利用。 在此环境下, 我们应用元学习时需要应对两个具体的挑战。 首先, 常规的 元- RL 算法需要长时间的在线读取。 首先, 常规的 Met- RL 算法需要用适当的离线数据来取代新任务。 我们用的是一种离线的演示和试验, 不需要在网上进行昂贵的元- RL 程序在线操作。 第二, 元- RL 方法可能无法概括到与在元培训时间所看到的新任务非常不同的新任务, 这给工业应用带来了特殊的挑战, 在那里, 高的成功率是关键 。 我们通过直线化任务, 将这个任务与直线上调换成一个新的任务, 。 我们用的是, 直线任务需要先变换到直线任务, 。