In the field of visual ego-motion estimation for Micro Air Vehicles (MAVs), fast maneuvers stay challenging mainly because of the big visual disparity and motion blur. In the pursuit of higher robustness, we study convolutional neural networks (CNNs) that predict the relative pose between subsequent images from a fast-moving monocular camera facing a planar scene. Aided by the Inertial Measurement Unit (IMU), we mainly focus on the translational motion. The networks we study have similar small model sizes (around 1.35MB) and high inference speeds (around 100Hz on a mobile GPU). Images for training and testing have realistic motion blur. Departing from a network framework that iteratively warps the first image to match the second with cascaded network blocks, we study different network architectures and training strategies. Simulated datasets and MAV flight datasets are used for evaluation. The proposed setup shows better accuracy over existing networks and traditional feature-point-based methods during fast maneuvers. Moreover, self-supervised learning outperforms supervised learning. The code developed for this paper will be open-source upon publication at https://github.com/tudelft/.
翻译:在微航空飞行器(MAVs)的视觉自动估计领域,快速动作主要由于视觉差异和运动模糊不清而具有挑战性。在追求更强的强力时,我们研究的是预测一个快速移动的单镜相机的图像在面对一个平板场景的快速移动的单一摄像头的相片之间相对构成的进化神经网络(CNNs),我们主要侧重于翻译运动。我们研究的网络有类似的小模型大小(约1.35MB)和高推断速度(移动式GPU上大约100Hz)。培训和测试的图像具有现实的运动模糊性。我们研究的是从一个网络框架中分离出第一个图像与连锁网络块相匹配的网络框架,我们研究不同的网络结构和培训战略。模拟数据集和MAV飞行数据集用于评估。拟议设置显示,在快速操控期间,现有网络和传统特征定位方法的准确性更强。此外,自我校准的学习外形功能也模糊不清。为本文开发的代码将在httpsqu/comft出版物上开源。