Monocular visual odometry consists of the estimation of the position of an agent through images of a single camera, and it is applied in autonomous vehicles, medical robots, and augmented reality. However, monocular systems suffer from the scale ambiguity problem due to the lack of depth information in 2D frames. This paper contributes by showing an application of the dense prediction transformer model for scale estimation in monocular visual odometry systems. Experimental results show that the scale drift problem of monocular systems can be reduced through the accurate estimation of the depth map by this model, achieving competitive state-of-the-art performance on a visual odometry benchmark.
翻译:单视视量测定法包括用单一相机的图像估计一种物剂的位置,并应用于自主车辆、医疗机器人和扩大现实;然而,单眼系统由于2D框架缺乏深度信息而面临规模模糊问题;本文通过在单眼视量测定系统中展示密集预测变压器模型用于比例估测而作出了贡献;实验结果显示,单眼系统的规模漂移问题可以通过这一模型准确估计深度地图来减少,从而在视觉眼计量基准上实现具有竞争力的先进性能。