We present 3DP3, a framework for inverse graphics that uses inference in a structured generative model of objects, scenes, and images. 3DP3 uses (i) voxel models to represent the 3D shape of objects, (ii) hierarchical scene graphs to decompose scenes into objects and the contacts between them, and (iii) depth image likelihoods based on real-time graphics. Given an observed RGB-D image, 3DP3's inference algorithm infers the underlying latent 3D scene, including the object poses and a parsimonious joint parametrization of these poses, using fast bottom-up pose proposals, novel involutive MCMC updates of the scene graph structure, and, optionally, neural object detectors and pose estimators. We show that 3DP3 enables scene understanding that is aware of 3D shape, occlusion, and contact structure. Our results demonstrate that 3DP3 is more accurate at 6DoF object pose estimation from real images than deep learning baselines and shows better generalization to challenging scenes with novel viewpoints, contact, and partial observability.
翻译:我们展示了3DP3, 一个用于在物体、场景和图像结构化的基因模型中进行推断的反向图形框架。 3DP3使用 (一) voxel 模型来代表物体的三维形状, (二) 将场景分解成物体及其之间接触的等级场景图, (三) 基于实时图形的深度图像可能性。 根据观察到的 RGB-D 图像, 3DP3 的推论算算算法将潜潜潜潜3D 场景, 包括对象的外形和这些外形的相近性共同对称, 使用快速自下而上式的配置图案建议, 新的不挥发式MCMMC 模型来代表三维物体的形状结构, 以及可选的神经对象探测器和形状测量器。 我们显示 3DP3 能够使场景了解了解3D 形状、 封闭度和接触结构。 我们的结果表明, 6DoF 对象的 3DP3比深层次的基线更精确地显示从真实图像中进行估计, 并显示以新观点、接触和部分可观测到挑战性的场景。