Building vehicles capable of operating without human supervision requires the determination of the agent's pose. Visual Odometry (VO) algorithms estimate the egomotion using only visual changes from the input images. The most recent VO methods implement deep-learning techniques using convolutional neural networks (CNN) extensively, which add a substantial cost when dealing with high-resolution images. Furthermore, in VO tasks, more input data does not mean a better prediction; on the contrary, the architecture may filter out useless information. Therefore, the implementation of computationally efficient and lightweight architectures is essential. In this work, we propose the RAM-VO, an extension of the Recurrent Attention Model (RAM) for visual odometry tasks. RAM-VO improves the visual and temporal representation of information and implements the Proximal Policy Optimization (PPO) algorithm to learn robust policies. The results indicate that RAM-VO can perform regressions with six degrees of freedom from monocular input images using approximately 3 million parameters. In addition, experiments on the KITTI dataset demonstrate that RAM-VO achieves competitive results using only 5.7% of the available visual information.
翻译:光学测量算法仅使用输入图像的视觉变化来估计自我感官。最新的VO方法广泛使用进化神经网络(CNN)来应用深层次学习技术,在处理高分辨率图像时会增加大量费用。此外,在VO任务中,更多的输入数据并不意味着更好的预测;相反,建筑可能过滤无用的信息。因此,实施计算高效和轻量级的结构至关重要。在这项工作中,我们提议将RAM-VO作为视觉观察测量任务的经常性关注模型(RAM)的延伸。RAM-VO改进信息的视觉和时间表达方式,并实施Proximal政策优化算法,以学习稳健的政策。结果显示,RAM-VO可以使用大约300万参数进行六度的单层输入图像自由回归。此外,KITTI数据集的实验表明,RAM-VO只能使用现有视觉信息的5.7%实现竞争性结果。