We review solutions to the problem of depth estimation, arguably the most important subtask in scene understanding. We focus on the single image depth estimation problem. Due to its properties, the single image depth estimation problem is currently best tackled with machine learning methods, most successfully with convolutional neural networks. We provide an overview of the field by examining key works. We examine non-deep learning approaches that mostly predate deep learning and utilize hand-crafted features and assumptions, and more recent works that mostly use deep learning techniques. The single image depth estimation problem is tackled first in a supervised fashion with absolute or relative depth information acquired from human or sensor-labeled data, or in an unsupervised way using unlabelled stereo images or video datasets. We also study multitask approaches that combine the depth estimation problem with related tasks such as semantic segmentation and surface normal estimation. Finally, we discuss investigations into the mechanisms, principles, and failure cases of contemporary solutions.
翻译:我们审视了深度估算问题的解决方案,可以说是现场理解中最重要的子任务。 我们关注的是单一图像深度估算问题。 由于其特性, 单一图像深度估算问题目前最好通过机器学习方法解决, 最成功的是进化神经网络。 我们通过审查关键工程来提供对实地的概览。 我们审视了大部分在深层学习之前就已使用手工制作的特征和假设以及大多数使用深层学习技术的最新作品的非深层学习方法。 单一图像深度估算问题首先以监督的方式解决, 由人类或传感器标签数据获得绝对或相对深度的信息, 或以未贴标签的立体图像或视频数据集不受监督的方式解决。 我们还研究了将深度估算问题与诸如语义分解和表面正常估计等相关任务相结合的多层方法。 最后, 我们讨论了对当代解决方案的机制、 原则以及失败案例的调查。