Real-time estimation of actual object depth is an essential module for various autonomous system tasks such as 3D reconstruction, scene understanding and condition assessment. During the last decade of machine learning, extensive deployment of deep learning methods to computer vision tasks has yielded approaches that succeed in achieving realistic depth synthesis out of a simple RGB modality. Most of these models are based on paired RGB-depth data and/or the availability of video sequences and stereo images. The lack of sequences, stereo data and RGB-depth pairs makes depth estimation a fully unsupervised single-image transfer problem that has barely been explored so far. This study builds on recent advances in the field of generative neural networks in order to establish fully unsupervised single-shot depth estimation. Two generators for RGB-to-depth and depth-to-RGB transfer are implemented and simultaneously optimized using the Wasserstein-1 distance, a novel perceptual reconstruction term and hand-crafted image filters. We comprehensively evaluate the models using industrial surface depth data as well as the Texas 3D Face Recognition Database, the CelebAMask-HQ database of human portraits and the SURREAL dataset that records body depth. For each evaluation dataset the proposed method shows a significant increase in depth accuracy compared to state-of-the-art single-image transfer methods.
翻译:对实际物体深度的实时估计是各种自主系统任务的基本模块,例如3D重建、现场理解和状况评估。在过去十年的机器学习期间,广泛运用深学习方法进行计算机愿景任务,产生了一些方法,成功地从简单的 RGB 模式中实现现实的深度合成。这些模型大多以配对的 RGB 深度数据和(或)视频序列和立体图像的提供为基础。由于缺乏序列、立体数据和RGB深度对配对,因此对完全不受监督的单一图像传输问题进行了深度估计,而迄今为止,这个问题还很少得到探讨。本研究以基因神经神经网络领域的最新进展为基础,以便建立完全不受监督的单发深度估计。RGB 深度和深度对RGB传输的两种生成器是使用瓦塞尔斯坦-1距离、新颖的视觉重建术语和手制图像过滤器进行实施和同时优化的。我们用工业表面深度数据以及德克萨斯 3D脸辨识数据库、CelebAMsk-HQ 进行全面评估,目的是建立基因神经网络领域的最新进展,以便建立完全不受监督的单一光线路透测测测测测的深度数据库,以显示每一项的每项数据系统,从而对比地测测测测测测测测测的每个数据方法。