Visual-inertial odometry (VIO) systems traditionally rely on filtering or optimization-based techniques for egomotion estimation. While these methods are accurate under nominal conditions, they are prone to failure during severe illumination changes, rapid camera motions, or on low-texture image sequences. Learning-based systems have the potential to outperform classical implementations in challenging environments, but, currently, do not perform as well as classical methods in nominal settings. Herein, we introduce a framework for training a hybrid VIO system that leverages the advantages of learning and standard filtering-based state estimation. Our approach is built upon a differentiable Kalman filter, with an IMU-driven process model and a robust, neural network-derived relative pose measurement model. The use of the Kalman filter framework enables the principled treatment of uncertainty at training time and at test time. We show that our self-supervised loss formulation outperforms a similar, supervised method, while also enabling online retraining. We evaluate our system on a visually degraded version of the EuRoC dataset and find that our estimator operates without a significant reduction in accuracy in cases where classical estimators consistently diverge. Finally, by properly utilizing the metric information contained in the IMU measurements, our system is able to recover metric scene scale, while other self-supervised monocular VIO approaches cannot.
翻译:视觉内皮测量(VIO)系统传统上依靠过滤或优化技术来进行自我感动估计。这些方法在名义条件下是准确的,但在严重照明变化、快速照相机动作或低质图像序列中往往会失败。基于学习的系统有可能在具有挑战性的环境中优于典型的实施,但目前在名义环境中并不及传统方法。在这里,我们引入了一个培训混合VIO系统的框架,利用学习和标准过滤基础国家估计的优势。我们的方法建立在可区别的Kalman过滤器上,采用IMU驱动的进程模型和坚固的神经网络衍生的相对形状测量模型。使用Kalman过滤框架使得在培训时间和试验时间对不确定性进行有原则的处理成为可能胜过典型,但在名义环境中,我们自我监督的损失配方也不符合类似、监督的方法。我们对EuRoC数据集的可见退化版本进行了评估,发现我们的估测仪是在一个不同的Kalman过滤器上运行的,没有显著的、由IMU驱动的、由神经网络衍生的相对构成的测量模型模型模型模型模型模型模型模型模型模型。在最后阶段,无法持续地缩小了我们的模型的自我测量系统。