This paper presents a unified framework for depth-aware panoptic segmentation (DPS), which aims to reconstruct 3D scene with instance-level semantics from one single image. Prior works address this problem by simply adding a dense depth regression head to panoptic segmentation (PS) networks, resulting in two independent task branches. This neglects the mutually-beneficial relations between these two tasks, thus failing to exploit handy instance-level semantic cues to boost depth accuracy while also producing sub-optimal depth maps. To overcome these limitations, we propose a unified framework for the DPS task by applying a dynamic convolution technique to both the PS and depth prediction tasks. Specifically, instead of predicting depth for all pixels at a time, we generate instance-specific kernels to predict depth and segmentation masks for each instance. Moreover, leveraging the instance-wise depth estimation scheme, we add additional instance-level depth cues to assist with supervising the depth learning via a new depth loss. Extensive experiments on Cityscapes-DPS and SemKITTI-DPS show the effectiveness and promise of our method. We hope our unified solution to DPS can lead a new paradigm in this area. Code is available at https://github.com/NaiyuGao/PanopticDepth.
翻译:本文为深度光学截面提供了一个统一的框架(DPS ), 目的是从一个图像中用实例级的语义学来重建 3D 场景, 从一个图像中用实例级的语义学来重建 3D 场景 。 先前的工程只是简单地在光学截面( PS) 网络中添加一个密集的深度回归头, 导致两个独立的任务分支 。 这忽略了这两项任务之间的互利关系, 从而没有利用简单实例级的语义学暗示来提高深度精度, 同时也制作了次优的深度地图 。 为了克服这些限制, 我们提出了DPS 任务的统一框架, 将动态演进技术应用于 PS 和深度预测任务 。 具体地说, 我们没有一次对所有像素的深度预测深度, 而是产生针对具体实例的内核面圈, 来预测每个例子的深度和分面面面面面罩。 此外, 我们没有利用实例性深度估算计划, 我们增加了实例级深度提示, 协助通过新的深度损失来监督深度学习深度的深度。 在 Cityscos- DPS 和S SS 和SemKIT-DPS DPD-DPDS 显示我们的方法模式是可用的。