Monocular depth reconstruction of complex and dynamic scenes is a highly challenging problem. While for rigid scenes learning-based methods have been offering promising results even in unsupervised cases, there exists little to no literature addressing the same for dynamic and deformable scenes. In this work, we present an unsupervised monocular framework for dense depth estimation of dynamic scenes, which jointly reconstructs rigid and non-rigid parts without explicitly modelling the camera motion. Using dense correspondences, we derive a training objective that aims to opportunistically preserve pairwise distances between reconstructed 3D points. In this process, the dense depth map is learned implicitly using the as-rigid-as-possible hypothesis. Our method provides promising results, demonstrating its capability of reconstructing 3D from challenging videos of non-rigid scenes. Furthermore, the proposed method also provides unsupervised motion segmentation results as an auxiliary output.
翻译:对复杂和动态场景进行单体深度重建是一个极具挑战性的问题。对于僵硬场景学习方法(即使是在无人监督的场景中)来说,即使以僵硬场景学习为基础的方法也带来了可喜的结果,但对于动态和变形场景而言,却几乎没有或根本没有涉及同样结果的文献。在这项工作中,我们提出了一个未经监督的单体框架,用于对动态场景进行密集的深度评估,在未明确模拟摄影机动作的情况下,共同重建硬体和非硬体部分。我们利用密集的通信,得出一个培训目标,目的是在重建的三维点之间随机保持对称的距离。在这个过程中,密深层图是用“硬体可变假设”隐含地学习的。我们的方法提供了有希望的结果,显示了它从非硬体场景的视频中重建三维的能力。此外,拟议方法还作为辅助产出,提供了无超力运动分段结果。