Long horizon robot learning tasks with sparse rewards pose a significant challenge for current reinforcement learning algorithms. A key feature enabling humans to learn challenging control tasks is that they often receive expert intervention that enables them to understand the high-level structure of the task before mastering low-level control actions. We propose a framework for leveraging expert intervention to solve long-horizon reinforcement learning tasks. We consider \emph{option templates}, which are specifications encoding a potential option that can be trained using reinforcement learning. We formulate expert intervention as allowing the agent to execute option templates before learning an implementation. This enables them to use an option, before committing costly resources to learning it. We evaluate our approach on three challenging reinforcement learning problems, showing that it outperforms state-of-the-art approaches by two orders of magnitude. Videos of trained agents and our code can be found at: https://sites.google.com/view/stickymittens
翻译:长期的机器人学习且回报微薄,对当前的强化学习算法构成重大挑战。使人类能够学习具有挑战性的控制任务的一个关键特征是,在掌握低层次控制行动之前,他们经常得到专家干预,从而能够理解任务的高层次结构。我们建议了一个利用专家干预解决长视距强化学习任务的框架。我们认为,这些框架是将潜在的选项编码为规格,可以通过强化学习加以培训。我们制定了专家干预,允许代理人在学习实施之前执行选项模板。这使他们能够在投入昂贵的资源来学习之前使用一个选项。我们评估了三个具有挑战性的强化学习问题的方法,显示它以两个数量级的形式超过了最先进的方法。经过培训的代理和我们的代码的视频可以在以下网址找到:https://sites.google.com/view/stickymitows。