Manipulation planning is the problem of finding a sequence of robot configurations that involves interactions with objects in the scene, e.g., grasping and placing an object, or more general tool-use. To achieve such interactions, traditional approaches require hand-engineering of object representations and interaction constraints, which easily becomes tedious when complex objects/interactions are considered. Inspired by recent advances in 3D modeling, e.g. NeRF, we propose a method to represent objects as continuous functions upon which constraint features are defined and jointly trained. In particular, the proposed pixel-aligned representation is directly inferred from images with known camera geometry and naturally acts as a perception component in the whole manipulation pipeline, thereby enabling long-horizon planning only from visual input. Project page: https://sites.google.com/view/deep-visual-constraints
翻译:操纵规划是寻找一系列机器人配置的问题,涉及与现场物体发生相互作用,例如掌握和放置物体,或更一般地使用工具。为了实现这种相互作用,传统方法要求用手设计物体的表象和相互作用限制,在考虑复杂的物体/相互作用时,这种限制很容易变得乏味。由于最近在3D模型(例如NeRF)方面的进展,我们提议一种方法,将物体作为连续功能来表示,对限制特性加以界定和联合训练。特别是,拟议的像素组合直接从已知相机几何形状的图像中推断出来,并自然作为整个操纵管道的感知部分,因此只能从视觉输入中进行长视波规划。项目网页:https://sites.gogle.com/view/deep-visial-constraints。项目网页:https://sites.gogle. com/view/deep-visal-constrains。