Learning-based visual ego-motion estimation is promising yet not ready for navigating agile mobile robots in the real world. In this article, we propose CUAHN-VIO, a robust and efficient monocular visual-inertial odometry (VIO) designed for micro aerial vehicles (MAVs) equipped with a downward-facing camera. The vision frontend is a content-and-uncertainty-aware homography network (CUAHN) that is robust to non-homography image content and failure cases of network prediction. It not only predicts the homography transformation but also estimates its uncertainty. The training is self-supervised, so that it does not require ground truth that is often difficult to obtain. The network has good generalization that enables "plug-and-play" deployment in new environments without fine-tuning. A lightweight extended Kalman filter (EKF) serves as the VIO backend and utilizes the mean prediction and variance estimation from the network for visual measurement updates. CUAHN-VIO is evaluated on a high-speed public dataset and shows rivaling accuracy to state-of-the-art (SOTA) VIO approaches. Thanks to the robustness to motion blur, low network inference time (~23ms), and stable processing latency (~26ms), CUAHN-VIO successfully runs onboard an Nvidia Jetson TX2 embedded processor to navigate a fast autonomous MAV.
翻译:以学习为基础的视觉自我感动估计是很有希望的,但对于在现实世界中导航灵活机动机器人来说,它还没有准备好。在本篇文章中,我们提议为微型航空飞行器设计的CUAHN-VIO(VIO),这是一个强大而高效的单镜视觉内皮测量仪(VIO),其设计为微型航空飞行器(MAVs),配备了向下偏向摄像机。视觉前端是一个内容和不确定的同源系统网络(CUAHN),它对于非摄影图像内容和网络预测失败案例来说是强大的。它不仅预测同质图像转换,而且估计其不确定性。培训是自我监督的,因此不需要经常难以获得的地面真相。这个网络具有良好的概括性,可以在没有微调的情况下在新环境中部署“插接和播放”。 一个轻量的卡尔曼扩展过滤器(KEFF)作为VIO的后端,利用网络的平均值预测和差异估计来进行视觉测量更新。CUAHN-VIO(C-VI)在高速公共数据集上进行了评估,并显示与VI-NAH-O-RO-O-RO-RO-O-S-RO-N-RO-O-RO-RO-O-C-C-O-O-S-NLVAL-O-O-O-O-S-S-S-S-O-S-O-O-S-S-S-S-S-O-S-S-MA-S-S-O-S-MA-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-MA-S-S-S-S-S-S-S-S-S-S-S-S-N-N-N-S-S-S-S-S-S-S-N-N-N-N-N-MA-MA-MA-MA-MA-MA-MA-MA-R-MA-MA-MA-MA-R-R-MA-MA-R-R-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA