Self-supervised monocular methods can efficiently learn depth information of weakly textured surfaces or reflective objects. However, the depth accuracy is limited due to the inherent ambiguity in monocular geometric modeling. In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints. Unfortunately, MVS often suffers from texture-less regions, non-Lambertian surfaces, and moving objects, especially in real-world video sequences without known camera motion and depth supervision. Therefore, we propose MOVEDepth, which exploits the MOnocular cues and VElocity guidance to improve multi-frame Depth learning. Unlike existing methods that enforce consistency between MVS depth and monocular depth, MOVEDepth boosts multi-frame depth learning by directly addressing the inherent problems of MVS. The key of our approach is to utilize monocular depth as a geometric priority to construct MVS cost volume, and adjust depth candidates of cost volume under the guidance of predicted camera velocity. We further fuse monocular depth and MVS depth by learning uncertainty in the cost volume, which results in a robust depth estimation against ambiguity in multi-view geometry. Extensive experiments show MOVEDepth achieves state-of-the-art performance: Compared with Monodepth2 and PackNet, our method relatively improves the depth accuracy by 20\% and 19.8\% on the KITTI benchmark. MOVEDepth also generalizes to the more challenging DDAD benchmark, relatively outperforming ManyDepth by 7.2\%. The code is available at https://github.com/JeffWang987/MOVEDepth.
翻译:自我监督的单体方法能够有效地学习98年低质纹理表面或反射物体的深度信息。然而,由于单色几何建模的内在模糊性,深度的准确性有限。相比之下,多框架深度估算方法由于多视野深度(MVS)的成功而提高了深度准确性,而多视野深度(MVS)则直接使用几何限制。不幸的是,MVS经常受到无纹区域、非地中海表面和移动物体的困扰,特别是在真实世界的视频序列中,特别是在没有已知的相机动作和深度监督的情况下。因此,我们提议MOVEDEDah,利用运动的移动方向和动性指导来改进多框架的深度。 与多框架的深度(MV)相比,MVS的深度(MVS)提高了深度(MVS)之间的一致性。我们的方法的关键是利用单面深度作为测深度,以构建MVS的成本效益,并在预测相机速度的指引下调整成本量的深度。我们进一步运用了VEOD的深度和深度(MER)的深度(VAL) 深度(VAL) 深度) 深度(VAL) 深度(VAL) 深度) 深度(VIL) 深度,我们通过深度(VIL) 深度) 深度(VIL) 深度(VI) 深度(VIL) 水平) 深度(V) 深度) 的深度) 水平) 深度(20) 深度(B) 深度(VI) 深度(B) ) 深度(B) 深度(B) 和深度) 深度(B) 的深度(B) 深度(B) 深度(B) ) 深度(B) 深度(B) ) ) ) ) ) ) 深度(B) 深度(B) 深度(B) ) 和深度(B) 深度(B) 深度(B) 深度(B) 深度(B) 深度(B) ) 和深度(B) 深度(B) ) ) 和深度(B) (I) ) ) ) ) 深度(B) 和深度(B) (I) 深度(B) 深度(B)