Real-time estimation of actual object depth is a module that is essential to performing various autonomous system tasks such as 3D reconstruction, scene understanding and condition assessment of machinery parts. During the last decade of machine learning, extensive deployment of deep learning methods to computer vision tasks has yielded approaches that succeed in achieving realistic depth synthesis out of a simple RGB modality. While most of these models are based on paired depth data or availability of video sequences and stereo images, methods for single-view depth synthesis in a fully unsupervised setting have hardly been explored. This study presents the most recent advances in the field of generative neural networks, leveraging them to perform fully unsupervised single-shot depth synthesis. Two generators for RGB-to-depth and depth-to-RGB transfer are implemented and simultaneously optimized using the Wasserstein-1 distance and a novel perceptual reconstruction term. To ensure that the proposed method is plausible, we comprehensively evaluate the models using industrial surface depth data as well as the Texas 3D Face Recognition Database and the SURREAL dataset that records body depth. The success observed in this study suggests the great potential for unsupervised single-shot depth estimation in real-world applications.
翻译:对实际物体深度的实时估计是执行各种自主系统任务(如3D重建、现场了解和机械部件状况评估)的一个必要模块。在过去十年的机器学习期间,在计算机视野任务中广泛采用深层学习方法,产生了通过简单的RGB模式实现现实深度合成的方法。虽然这些模型大多以配对深度数据或提供视频序列和立体图像为基础,但在完全不受监督的环境中,单视深度合成方法几乎没有得到探讨。本研究报告介绍了基因神经网络领域的最新进展,利用这些网络进行完全不受监督的单发深度合成。有两个RGB至深度和深度至RGB传输的发电机,利用瓦塞斯坦-1距离和一个新颖的感知性重建术语实施并同时优化。为了确保拟议方法合理,我们用工业表面深度数据以及德克萨斯3D面识别数据库和记录体深度的SUREL数据集全面评价模型。本研究中观察到的成功显示,在现实世界中进行未受监督的单发深度估计的潜力巨大。