学习技能的示范-指导强化学习 (Demonstration-Guided Reinforcement Learning with Learned Skills)

Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every new task as an independent learning problem and attempt to follow the provided demonstrations step-by-step, akin to a human trying to imitate a completely unseen behavior by following the demonstrator's exact muscle movements. Naturally, such learning will be slow, but often new behaviors are not completely unseen: they share subtasks with behaviors we have previously learned. In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL. We first learn a set of reusable skills from large offline datasets of prior experience collected across many tasks. We then propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations by following the demonstrated skills instead of the primitive actions, resulting in substantial performance improvements over prior demonstration-guided RL approaches. We validate the effectiveness of our approach on long-horizon maze navigation and complex robot manipulation tasks.

翻译：演示引导强化学习( RL) 是学习复杂行为的一个很有希望的方法, 通过利用奖励反馈和一组目标任务演示来学习。演示引导RL的先前方法将每一项新任务都视为独立的学习问题,并试图一步步地跟踪所提供的演示演示,类似于一个人试图通过跟踪演示人的肌肉运动来模仿完全看不见的行为。当然,这种学习将是缓慢的,但新行为往往不是完全看不见的:它们与我们以前学到的行为有着分任务。在这项工作中,我们的目标是利用这一共享的子任务结构来提高演示引导RL的效率。我们首先从大量收集的以往经验的大型离线数据集中学习一套可重复使用的技能。我们然后提出以演示( SkiLD) 为基础的基于技能的学习算法, 一种演示引导RL 的算法, 有效地利用演示所提供的示范, 而不是原始行动, 从而大大改进了先前演示引导的RL 方法的绩效。我们验证了我们在远程磁盘导航和复杂机器人操纵任务上的方法的有效性。