In this paper, we study the problem of learning a repertoire of low-level skills from raw images that can be sequenced to complete long-horizon visuomotor tasks. Reinforcement learning (RL) is a promising approach for acquiring short-horizon skills autonomously. However, the focus of RL algorithms has largely been on the success of those individual skills, more so than learning and grounding a large repertoire of skills that can be sequenced to complete extended multi-stage tasks. The latter demands robustness and persistence, as errors in skills can compound over time, and may require the robot to have a number of primitive skills in its repertoire, rather than just one. To this end, we introduce EMBER, a model-based RL method for learning primitive skills that are suitable for completing long-horizon visuomotor tasks. EMBER learns and plans using a learned model, critic, and success classifier, where the success classifier serves both as a reward function for RL and as a grounding mechanism to continuously detect if the robot should retry a skill when unsuccessful or under perturbations. Further, the learned model is task-agnostic and trained using data from all skills, enabling the robot to efficiently learn a number of distinct primitives. These visuomotor primitive skills and their associated pre- and post-conditions can then be directly combined with off-the-shelf symbolic planners to complete long-horizon tasks. On a Franka Emika robot arm, we find that EMBER enables the robot to complete three long-horizon visuomotor tasks at 85% success rate, such as organizing an office desk, a file cabinet, and drawers, which require sequencing up to 12 skills, involve 14 unique learned primitives, and demand generalization to novel objects.
翻译:在本文中, 我们研究从原始图像中学习一系列低水平技能的问题, 原始图像可以顺序排列完成长半径相对运动任务。 强化学习( RL) 是一个很有希望的方法, 可以自动获得短视水平技能。 但是, RL 算法的重心主要在于这些个人技能的成功, 而不是学习和展示大量技能的集合, 而这些技能可以排序完成多阶段任务。 后者需要的是强健和耐久性, 因为技能方面的错误会随着时间的变异而变异, 可能要求机器人拥有一系列原始技能来完成长视长视线任务。 为此, 我们引入了基于模型的 RLL 方法, 用来学习适合完成长视离子任务、 校正、 和成功分级仪的原始技能。 EMBAR 成功分解器既能给 RL 带来独特的奖赏功能, 也可以作为一个基础机制, 来持续地检测机器人是否在不成功或经训练的智能后, 需要的是, 12 直视、 直观和直观的智能任务, 学习这些技能。