Manipulation planning is the problem of finding a sequence of robot configurations that involves interactions with objects in the scene, e.g., grasping and placing an object, or more general tool-use. To achieve such interactions, traditional approaches require hand-engineering of object representations and interaction constraints, which easily becomes tedious when complex objects/interactions are considered. Inspired by recent advances in 3D modeling, e.g. NeRF, we propose a method to represent objects as neural implicit functions upon which constraint features are defined and jointly trained. In particular, the proposed pixel-aligned representation is directly inferred from images with known camera geometry and naturally acts as a perception component in the whole manipulation pipeline, thereby enabling long-horizon planning only from visual input. Video: https://youtu.be/r__mIGTu6Jg
翻译:操纵规划是寻找一系列机器人配置的问题,涉及与现场物体发生相互作用,例如,掌握和放置物体,或更一般地使用工具。为了实现这种相互作用,传统方法要求用手设计物体的表象和相互作用限制,在考虑复杂的物体/相互作用时,这些限制很容易变得乏味。在3D模型(例如NeRF)最近的进展的启发下,我们提议一种方法,将物体作为神经隐含功能来表示,对限制特性加以界定和联合训练。特别是,拟议的像素组合代表法直接从已知照相机几何形状的图像中推断出来,并自然作为整个操纵管道中的一种感知部分,从而只能从视觉输入中进行长视波规划。视频:https://youtu.be/r__miGTu6Jg:https://youtu.be/r_miGTU6g。