Real-world robotic manipulation tasks remain an elusive challenge, since they involve both fine-grained environment interaction, as well as the ability to plan for long-horizon goals. Although deep reinforcement learning (RL) methods have shown encouraging results when planning end-to-end in high-dimensional environments, they remain fundamentally limited by poor sample efficiency due to inefficient exploration, and by the complexity of credit assignment over long horizons. In this work, we present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL to achieve long-horizon complex manipulation tasks. We leverage task-agnostic play data to learn a discrete behavioral prior over object-centric primitives, modeling their feasibility given the current context. We then design a high-level goal-conditioned policy which (1) uses primitives as building blocks to scaffold complex long-horizon tasks and (2) leverages the behavioral prior to accelerate learning. We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks and learns policies that can be easily transferred to physical hardware.
翻译:实际机器人操作任务仍然是一个难以攻克的挑战,因为它们涉及细粒度的环境交互,以及规划长期目标的能力。尽管深度强化学习(RL)方法在高维度环境中端对端规划方面取得了令人鼓舞的结果,但由于探索不足和长期视野内的信贷赋值复杂性而始终存在效率问题。在这项工作中,我们提出了一种从游戏中有效学习高层次计划(ELF-P)的机器人学习框架,该框架将运动规划和深度RL相结合,以实现长期复杂操作任务。我们利用任务不可知的游玩数据来学习一个关于物体为中心的基元的离散行为先验知识,来描述基元在当前情况下的可行性。然后,我们设计了一个高级的、目标条件化的策略,该策略(1)使用基元作为组合复杂的长期任务的基块,以及(2)利用行为先验知识来加速学习。我们证明,与相关的基准模型相比,ELF-P在多种真实操作任务中具有显着更好的样本效率,并且学习到的策略可以轻松地转移到物理硬件上。