In this work, we present a lightweight, tightly-coupled deep depth network and visual-inertial odometry (VIO) system, which can provide accurate state estimates and dense depth maps of the immediate surroundings. Leveraging the proposed lightweight Conditional Variational Autoencoder (CVAE) for depth inference and encoding, we provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction and generalization capability. The compact encoded depth maps are then updated jointly with navigation states in a sliding window estimator in order to provide the dense local scene geometry. We additionally propose a novel method to obtain the CVAE's Jacobian which is shown to be more than an order of magnitude faster than previous works, and we additionally leverage First-Estimate Jacobian (FEJ) to avoid recalculation. As opposed to previous works relying on completely dense residuals, we propose to only provide sparse measurements to update the depth code and show through careful experimentation that our choice of sparse measurements and FEJs can still significantly improve the estimated depth maps. Our full system also exhibits state-of-the-art pose estimation accuracy, and we show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
翻译:在这项工作中,我们提出了一个轻量、紧紧交错的深深度网络和视觉神经测量(VIO)系统,可以提供准确的状态估计和周围环境的密集深度地图。利用拟议的轻量定质变动自动编码器(CVAE)进行深度推断和编码,我们向该网络提供VIO以前处于边缘地位的特征,以提高初始深度预测和概括能力的准确性。然后,紧凑的编码深度地图与导航国在滑动窗口估计仪中共同更新,以便提供密度稠密的当地场景测地。我们还提出了获取CVAE的Jacobian的新方法,该方法比以往的工程要快得多,我们还利用I-Esimatete Jacobian(FJ)进行进一步计算,以避免重新计算。与以前依靠完全密集的残余物的工程相比,我们建议只提供稀少的测量数据,以便更新深度代码,并通过仔细的实验表明,我们选择的稀疏测量和FEJS(FEJ)仍然能够大大改进估计的深度地图。我们系统的精确度也只能使用G-PRO时间显示整个系统进行深度估算。