Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for robotics, motion planning, or augmented reality. Existing works in 3D perception from a single RGB image tend to focus on geometric reconstruction only, or geometric reconstruction with semantic segmentation or instance segmentation. Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction - from a single RGB image, predicting the complete geometric reconstruction of the scene in the camera frustum of the image, along with semantic and instance segmentations. We thus propose a new approach for holistic 3D scene understanding from a single RGB image which learns to lift and propagate 2D features from an input image to a 3D volumetric scene representation. We demonstrate that this holistic view of joint scene reconstruction, semantic, and instance segmentation is beneficial over treating the tasks independently, thus outperforming alternative approaches.
翻译:从单一图像中了解 3D 场景对于各种各样的任务至关重要,例如机器人、运动规划或增强现实等。从单一 RGB 图像中现有三维场景的工程往往只侧重于几何重建,或以语义分割或实例分割法进行几何重建。受 2D 泛光分割法的启发,我们提议将几何重建、 3D 语义分割法和 3D 实例分割法的任务与全视 3D 场景重建任务(从单一 RGB 图像中产生,预测图像摄像断层中场场的完全几何重建,以及语义和实例分割法。因此,我们从单一 RGB 图像中提出一种全方位三维场景理解新办法,从输入图像中提升和传播二维地貌,到 3D 体积场展示法。我们证明这种对联合场景重建、 语义和实例分割法的整体观点有利于独立处理任务,从而优于其他方式。