Multi-goal policy learning for robotic manipulation is challenging. Prior successes have used state-based representations of the objects or provided demonstration data to facilitate learning. In this paper, by hand-coding a high-level discrete representation of the domain, we show that policies to reach dozens of goals can be learned with a single network using Q-learning from pixels. The agent focuses learning on simpler, local policies which are sequenced together by planning in the abstract space. We compare our method against standard multi-goal RL baselines, as well as other methods that leverage the discrete representation, on a challenging block construction domain. We find that our method can build more than a hundred different block structures, and demonstrate forward transfer to structures with novel objects. Lastly, we deploy the policy learned in simulation on a real robot.
翻译:为机器人操作进行多目标政策学习是一项艰巨的任务。 以往的成功都使用了基于国家的物体表达方式,或提供了演示数据,以促进学习。 在本文中,通过手动编码对领域进行高层次的离散代表,我们展示出,可以通过一个使用像素Q学习的单一网络来学习实现数十项目标的政策。 代理机构侧重于学习通过抽象空间规划排列的更简单的地方政策。 我们比较了我们的方法与标准的多目标RL基线,以及利用离散代表方式的其他方法,在具有挑战性的块块建筑领域。 我们发现,我们的方法可以建造100多个不同的块结构,并展示向带有新目标的结构的前沿转移。 最后,我们将在模拟中学习的政策运用在真正的机器人上。