Long horizon sequential manipulation tasks are effectively addressed hierarchically: at a high level of abstraction the planner searches over abstract action sequences, and when a plan is found, lower level motion plans are generated. Such a strategy hinges on the ability to reliably predict that a feasible low level plan will be found which satisfies the abstract plan. However, computing Abstract Plan Feasibility (APF) is difficult because the outcome of a plan depends on real-world phenomena that are difficult to model, such as noise in estimation and execution. In this work, we present an active learning approach to efficiently acquire an APF predictor through task-independent, curious exploration on a robot. The robot identifies plans whose outcomes would be informative about APF, executes those plans, and learns from their successes or failures. Critically, we leverage an infeasible subsequence property to prune candidate plans in the active learning strategy, allowing our system to learn from less data. We evaluate our strategy in simulation and on a real Franka Emika Panda robot with integrated perception, experimentation, planning, and execution. In a stacking domain where objects have non-uniform mass distributions, we show that our system permits real robot learning of an APF model in four hundred self-supervised interactions, and that our learned model can be used effectively in multiple downstream tasks.
翻译:在高抽象层次上,规划者对抽象行动序列进行搜索,当找到计划时,将产生较低级别的运动计划。这样的战略取决于能否可靠地预测能够找到符合抽象计划的可行的低水平计划。然而,计算抽象计划可行性(APF)很困难,因为计划的结果取决于难以建模的现实世界现象,例如估计和执行中的噪音。在这项工作中,我们提出了一个积极的学习方法,通过任务独立、好奇的机器人探索,高效率地获得APF预测器。机器人确定了其结果将会对APF有所了解的计划,执行这些计划并从这些计划的成功或失败中学习。关键是,我们利用一个不可行的子序列属性,在积极学习战略中为候选计划牵线,使我们的系统能够从较少的数据中学习。我们在模拟和真实的Franka Emika Panda机器人上评价了我们的战略,并集思广益、实验、规划和执行。在一个堆叠的域中,在四种不统一质量的物体分配模型中,执行这些计划,从这些计划,从这些计划的成败中学习。我们利用了一种不切实际的子,我们所学的系统可以学习到的多层次的机器人。