We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach performs global scale and shift alignment against sparse metric depth, followed by learning-based dense alignment. We evaluate on the TartanAir and VOID datasets, observing up to 30% reduction in inverse RMSE with dense scale alignment relative to performing just global alignment alone. Our approach is especially competitive at low density; with just 150 sparse metric depth points, our dense-to-dense depth alignment method achieves over 50% lower iRMSE over sparse-to-dense depth completion by KBNet, currently the state of the art on VOID. We demonstrate successful zero-shot transfer from synthetic TartanAir to real-world VOID data and perform generalization tests on NYUv2 and VCU-RVI. Our approach is modular and is compatible with a variety of monocular depth estimation models. Video: https://youtu.be/IMwiKwSpshQ Code: https://github.com/isl-org/VI-Depth
翻译:我们提出了一种视觉惯性深度估计管道,它集成了单目深度估计和视觉惯性测距,以产生具有度量尺度的密集深度估计。我们的方法针对稀疏的度量深度执行全局尺度和移位对齐,然后进行基于学习的密集对齐。我们在TartanAir和VOID数据集上进行评估,在执行仅全局对齐时,相对于进行全局对齐而言,观察到多达30%的iRMSE降低。我们的方法特别适用于低密度;仅具有150个稀疏的度量深度点,我们的密集对密集深度对齐方法相对于KBNet的稀疏对密集深度完成方法,在VOI上实现了超过50%的iRMSE降低,而KBNet目前是VOI上现有技术的最新水平。我们展示了从合成TartanAir到真实VOID数据的零射击转移,并在NYUv2和VCU-RVI上进行了泛化测试。我们的方法是模块化的,并且与各种单目深度估计模型兼容。视频:https://youtu.be/IMwiKwSpshQ 代码:https://github.com/isl-org/VI-Depth