In robotics, it is often not possible to learn useful policies using pure model-free reinforcement learning without significant reward shaping or curriculum learning. As a consequence, many researchers rely on expert demonstrations to guide learning. However, acquiring expert demonstrations can be expensive. This paper proposes an alternative approach where the solutions of previously solved tasks are used to produce an action prior that can facilitate exploration in future tasks. The action prior is a probability distribution over actions that summarizes the set of policies found solving previous tasks. Our results indicate that this approach can be used to solve robotic manipulation problems that would otherwise be infeasible without expert demonstrations.
翻译:在机器人方面,往往不可能在不进行重大奖励制成或课程学习的情况下,学习使用纯粹无模式强化学习的有益政策;因此,许多研究人员依靠专家示范来指导学习;然而,获得专家示范可能费用昂贵;本文件提出另一种办法,即利用以前解决过的任务的解决方案来产生行动,从而便利今后任务的探索;先前的行动是,对总结解决以往任务的一系列政策的行动进行概率分配;我们的结果显示,这种方法可用于解决机器人操纵问题,而如果没有专家示范,这些问题本来是行不通的。