In this paper we present a novel self-supervised method to anticipate the depth estimate for a future, unobserved real-world urban scene. This work is the first to explore self-supervised learning for estimation of monocular depth of future unobserved frames of a video. Existing works rely on a large number of annotated samples to generate the probabilistic prediction of depth for unseen frames. However, this makes it unrealistic due to its requirement for large amount of annotated depth samples of video. In addition, the probabilistic nature of the case, where one past can have multiple future outcomes often leads to incorrect depth estimates. Unlike previous methods, we model the depth estimation of the unobserved frame as a view-synthesis problem, which treats the depth estimate of the unseen video frame as an auxiliary task while synthesizing back the views using learned pose. This approach is not only cost effective - we do not use any ground truth depth for training (hence practical) but also deterministic (a sequence of past frames map to an immediate future). To address this task we first develop a novel depth forecasting network DeFNet which estimates depth of unobserved future by forecasting latent features. Second, we develop a channel-attention based pose estimation network that estimates the pose of the unobserved frame. Using this learned pose, estimated depth map is reconstructed back into the image domain, thus forming a self-supervised solution. Our proposed approach shows significant improvements in Abs Rel metric compared to state-of-the-art alternatives on both short and mid-term forecasting setting, benchmarked on KITTI and Cityscapes. Code is available at https://github.com/sauradip/depthForecasting
翻译:在本文中,我们提出了一个全新的自我监督方法,以预测未来、未观测到的真实世界城市场景的深度估计。 这项工作是首次探索自我监督的学习,以估计未来未观测到的视频框架的单幅深度。 现有的工程依靠大量附加说明的样本来生成对隐性框架深度的概率预测。 但是, 这样做并不现实, 因为它需要大量附加注释的深度视频样本。 此外, 案件概率性深度, 过去可能有许多未来结果, 往往导致不正确的深度估计。 与以往的方法不同, 我们将未观测的框架的深度估计建为视觉合成问题。 现有的工程依靠大量附加注释的样本样本样本来生成对隐蔽框架的深度估计。 这种方法不仅成本有效 - 我们没有使用任何地面的深度来进行培训( 实际的), 但也使用确定性的方法( 过去框架地图到近期的顺序 ) 。 为了应对这项任务, 我们首先开发了一个新的深度预测网络, DeFNet 将隐蔽的深度估算作为辅助任务, 因此, 将这个深度的深度估算 将我们所了解的系统 的深度预测 的深度 。