Robots planning long-horizon behavior in complex environments must be able to quickly reason about the impact of the environment's geometry on what plans are feasible, i.e., whether there exist action parameter values that satisfy all constraints on a candidate plan. In tasks involving articulated and movable obstacles, typical Task and Motion Planning (TAMP) algorithms spend most of their runtime attempting to solve unsolvable constraint satisfaction problems imposed by infeasible plan skeletons. We developed a novel Transformer-based architecture, PIGINet, that predicts plan feasibility based on the initial state, goal, and candidate plans, fusing image and text embeddings with state features. The model sorts the plan skeletons produced by a TAMP planner according to the predicted satisfiability likelihoods. We evaluate the runtime of our learning-enabled TAMP algorithm on several distributions of kitchen rearrangement problems, comparing its performance to that of non-learning baselines and algorithm ablations. Our experiments show that PIGINet substantially improves planning efficiency, cutting down runtime by 80% on average on pick-and-place problems with articulated obstacles. It also achieves zero-shot generalization to problems with unseen object categories thanks to its visual encoding of objects.
翻译:计划在复杂环境中规划长方程行为的机器人必须能够快速了解环境的几何测量对哪些计划可行的影响,即是否存在能满足候选计划所有限制的行动参数值。在涉及明确障碍和移动障碍的任务中,典型的任务和动作规划(TAMP)算法花费大部分运行时间试图解决无法解决的制约性满意度问题,我们开发了一个新的基于变异器的新结构PIGINet(PIGINet),根据初始状态、目标和候选计划预测计划的可行性,使用图像和文字嵌入状态特征。模型根据预测的可隐蔽可能性,将TAMP计划设计师产生的计划骨架排序。我们评估了我们借助学习的TAMP算法(TAMP)的运行时间,用于解决厨房重新布局问题的若干分布,将它与非学习基线和算法浮化的功能相比较。我们的实验显示,PIGINet大大改进了规划效率,将选择和定位目标的平均运行时间缩短了80 %。我们还评估了透视像目标的运行时间,从而解决了普通目标的零位问题。