We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our algorithm offers a joint framework of neuro-symbolic task planning and low-level motion generation conditioned on the specified goal. At the core of our approach is a two-level scene graph representation, namely geometric scene graph and symbolic scene graph. This hierarchical representation serves as a structured, object-centric abstraction of manipulation scenes. Our model uses graph neural networks to process these scene graphs for predicting high-level task plans and low-level motions. We demonstrate that our method scales to long-horizon tasks and generalizes well to novel task goals. We validate our method in a kitchen storage task in both physical simulation and the real world. Our experiments show that our method achieved over 70% success rate and nearly 90% of subgoal completion rate on the real robot while being four orders of magnitude faster in computation time compared to standard search-based task-and-motion planner.
翻译:我们为长视分流操作任务提出了一个基于视觉的等级规划算法。 我们的算法提供了一个神经- 共振任务规划和低层次运动生成的联合框架, 以特定目标为条件。 我们的方法核心是一个双层的场景图示, 即几何场景图和象征性的场景图。 这种分层表示法是操纵场景结构化的、 以物体为中心的抽象图。 我们的模型使用图形神经网络处理这些场景图, 以预测高级别任务计划和低层次动作。 我们证明我们的方法比重到长视线任务和低层次任务目标, 并且非常概括到新的任务目标。 我们在物理模拟和现实世界中验证了厨房存储任务的方法。 我们的实验表明, 我们的方法在实际机器人上达到了70%的成功率和近90%的次级目标完成率, 而与标准的基于搜索的任务和动作规划者相比, 在计算时间的四级中速度更快。