Performing single image holistic understanding and 3D reconstruction is a central task in computer vision. This paper presents an integrated system that performs holistic image segmentation, object detection, instance segmentation, depth estimation, and object instance 3D reconstruction for indoor and outdoor scenes from a single RGB image. We name our system panoptic 3D parsing in which panoptic segmentation ("stuff" segmentation and "things" detection/segmentation) with 3D reconstruction is performed. We design a stage-wise system where a complete set of annotations is absent. Additionally, we present an end-to-end pipeline trained on a synthetic dataset with a full set of annotations. We show results on both indoor (3D-FRONT) and outdoor (COCO and Cityscapes) scenes. Our proposed panoptic 3D parsing framework points to a promising direction in computer vision. It can be applied to various applications, including autonomous driving, mapping, robotics, design, computer graphics, robotics, human-computer interaction, and augmented reality.
翻译:执行单一图像整体理解和3D重建是计算机愿景的一项核心任务。 本文展示了一个集成系统, 用于从一个 RGB 图像中进行整体图像分割、 对象探测、 试区分割、 深度估计 和对象实例 3D, 用于室内和室外场景的重建。 我们命名了我们的系统全光 3D 剖析, 用于3D 重建 的全光分割(“ 附加” 和“ 显示” 探测/ 分层 ) 。 我们设计了一个舞台系统, 缺少完整的说明。 此外, 我们展示了一个经过合成数据集培训的端对端管道, 配有全套说明。 我们在室内( 3D- FRONT) 和室外( CO和 Cityscovers) 场景上都展示了结果。 我们提议的全光 3D 剖析框架指向计算机愿景有希望的方向。 它可以应用于各种应用, 包括自主驱动、 制图、 机器人、 设计、 计算机图形、 机器人、 机器人、 人- 计算机互动 以及 增强现实 。