用于任务和运动规划的机器人技能学习构成模型 (Learning compositional models of robot skills for task and motion planning)

The objective of this work is to augment the basic abilities of a robot by learning to use sensorimotor primitives to solve complex long-horizon manipulation problems. This requires flexible generative planning that can combine primitive abilities in novel combinations and thus generalize across a wide variety of problems. In order to plan with primitive actions, we must have models of the actions: under what circumstances will executing this primitive successfully achieve some particular effect in the world? We use, and develop novel improvements on, state-of-the-art methods for active learning and sampling. We use Gaussian process methods for learning the constraints on skill effectiveness from small numbers of expensive-to-collect training examples. Additionally, we develop efficient adaptive sampling methods for generating a comprehensive and diverse sequence of continuous candidate control parameter values (such as pouring waypoints for a cup) during planning. These values become end-effector goals for traditional motion planners that then solve for a full robot motion that performs the skill. By using learning and planning methods in conjunction, we take advantage of the strengths of each and plan for a wide variety of complex dynamic manipulation tasks. We demonstrate our approach in an integrated system, combining traditional robotics primitives with our newly learned models using an efficient robot task and motion planner. We evaluate our approach both in simulation and in the real world through measuring the quality of the selected primitive actions. Finally, we apply our integrated system to a variety of long-horizon simulated and real-world manipulation problems.

翻译：这项工作的目标是提高机器人的基本能力,学会使用感官莫托原始原始材料解决复杂的长程风速操纵问题。这需要灵活的基因规划,将原始能力结合到新组合中,从而在各种各样的问题中推广。为了以原始行动进行规划,我们必须有行动模型:在什么情况下实施原始技术将成功地在世界上取得某种特定效果?我们使用并开发新颖的改进,用于积极学习和取样的先进方法。我们利用高斯进程方法从少量的昂贵到集体培训的例子中了解技能效力方面的制约因素。此外,我们开发高效的适应性取样方法,以便在规划期间形成一个全面和多样的连续候选人控制参数序列(例如为杯子铺设路标),这些价值观将成为传统运动规划者的终端效应目标,然后解决掌握技能的全机器人运动。我们利用学习和规划方法,利用每一种不同种类的优势和计划来广泛复杂的动态操纵任务。我们用一个综合的系统来展示我们的方法,在真正的模拟系统里,我们用一种新式的机械模型来测量我们所选择的模型和新式的原始操作行动。