Object geometry is key information for robot manipulation. Yet, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occlusion occurs. In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals. First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry. Second, contact information, which can be obtained from videos and physical interactions, provides sparse constraints on the boundary of the geometry. We combine the two sources of information through contact-guided 3D generation. The guidance formulation is inspired by drag-based editing in generative models. Experiments on synthetic and real-world data show that our approach improves the reconstruction compared to pure 3D generation and contact-based optimization.
翻译:物体几何信息是机器人操作的关键。然而,物体重建是一项具有挑战性的任务,因为相机仅能捕获物体的部分观测,尤其在发生遮挡时。本文利用两种额外信息源来降低视觉信号的歧义性。首先,生成模型学习常见物体形状的先验分布,使我们能够对几何结构的不可见部分做出合理推测。其次,接触信息(可从视频与物理交互中获取)为几何边界提供稀疏约束。我们通过接触引导的三维生成将两种信息源相结合。该引导机制受生成模型中基于拖拽编辑方法的启发。在合成数据与真实数据上的实验表明,相较于纯三维生成方法与基于接触的优化方法,本方法显著提升了重建质量。