Pseudo depth maps are depth map predicitions which are used as ground truth during training. In this paper we leverage pseudo depth maps in order to segment objects of classes that have never been seen during training. This renders our object segmentation task an open world task. The pseudo depth maps are generated using pretrained networks, which have either been trained with the full intention to generalize to downstream tasks (LeRes and MiDaS), or which have been trained in an unsupervised fashion on video sequences (MonodepthV2). In order to tell our network which object to segment, we provide the network with a single click on the object's surface on the pseudo depth map of the image as input. We test our approach on two different scenarios: One without the RGB image and one where the RGB image is part of the input. Our results demonstrate a considerably better generalization performance from seen to unseen object types when depth is used. On the Semantic Boundaries Dataset we achieve an improvement from $61.57$ to $69.79$ IoU score on unseen classes, when only using half of the training classes during training and performing the segmentation on depth maps only.
翻译:伪深度图是使用预训练网络生成的深度图预测,用作训练期间的地面真实值。在本文中,我们利用伪深度图对从未在训练中见过的类的对象进行分割。这使得我们的目标分割任务成为一个开放世界任务。伪深度图是使用已经受过训练的网络生成的,这些网络已经在全面考虑向下游任务推广的情况下进行了训练(LeRes和MiDaS)或者是以无监督方式在视频序列上进行训练的(MonodepthV2)。为了告诉我们的网络要分割哪个对象,我们在图像的伪深度图表面上用一个单击作为输入向网络提供信息。我们在两种不同的情况下测试了我们的方法:一种没有RGB图像的情况,另一种情况是RGB图像是输入的一部分。我们的结果表明,当使用深度信息时,从已知对象类型到未知对象类型的泛化性能明显更好。在Semantic Boundaries数据集上,当仅使用一半的训练类进行训练并仅在深度图上执行分割时,我们将IoU得分从61.57提高到69.79。