Estimating a depth map from a single RGB image has been investigated widely for localization, mapping, and 3-dimensional object detection. Recent studies on a single-view depth estimation are mostly based on deep Convolutional neural Networks (ConvNets) which require a large amount of training data paired with densely annotated labels. Depth annotation tasks are both expensive and inefficient, so it is inevitable to leverage RGB images which can be collected very easily to boost the performance of ConvNets without depth labels. However, most self-supervised learning algorithms are focused on capturing the semantic information of images to improve the performance in classification or object detection, not in depth estimation. In this paper, we show that existing self-supervised methods do not perform well on depth estimation and propose a gradient-based self-supervised learning algorithm with momentum contrastive loss to help ConvNets extract the geometric information with unlabeled images. As a result, the network can estimate the depth map accurately with a relatively small amount of annotated data. To show that our method is independent of the model structure, we evaluate our method with two different monocular depth estimation algorithms. Our method outperforms the previous state-of-the-art self-supervised learning algorithms and shows the efficiency of labeled data in triple compared to random initialization on the NYU Depth v2 dataset.
翻译:从单一 RGB 图像中估算深度图已被广泛调查,用于本地化、绘图和三维天体探测。最近对单一视图深度估计的研究大多基于深层进化神经网络(ConvNets),这些研究需要大量培训数据,同时配有密集注解标签。 深度注解任务既昂贵又低效,因此,利用能够非常容易收集的 RGB 图像促进ConvNet 的性能,而不贴有深度标签。然而,大多数自我监督的学习算法都侧重于获取图像的语义信息,以改进分类或对象探测的性能,而不是深度估测。在本文中,我们显示现有的自我监督方法在深度估测方面效果不佳,并提议一种基于梯度的自我监督学习算法,以动力对比损失来帮助ConvNet 利用未贴标签的图像来提取几何信息。因此,网络可以用相对少量的注解数据来准确估计深度地图的深度。为了显示我们的方法独立于模型结构的初始结构或对象探测,而不是深度估测的性。我们用两种不同的单层估测算的方法,我们用不同的自我测算了前几色的自我测算方法。我们的数据结构。我们用了前的深度估测算方法,我们之前的深度估测算了前的基。