Self-supervised multi-frame depth estimation achieves high accuracy by computing matching costs of pixel correspondences between adjacent frames, injecting geometric information into the network. These pixel-correspondence candidates are computed based on the relative pose estimates between the frames. Accurate pose predictions are essential for precise matching cost computation as they influence the epipolar geometry. Furthermore, improved depth estimates can, in turn, be used to align pose estimates. Inspired by traditional structure-from-motion (SfM) principles, we propose the DualRefine model, which tightly couples depth and pose estimation through a feedback loop. Our novel update pipeline uses a deep equilibrium model framework to iteratively refine depth estimates and a hidden state of feature maps by computing local matching costs based on epipolar geometry. Importantly, we used the refined depth estimates and feature maps to compute pose updates at each step. This update in the pose estimates slowly alters the epipolar geometry during the refinement process. Experimental results on the KITTI dataset demonstrate competitive depth prediction and odometry prediction performance surpassing published self-supervised baselines.
翻译:自监督多帧深度估计通过计算相邻帧之间像素对应关系的匹配成本来实现高精度,将几何信息注入网络。这些像素对应候选项是基于帧间相对姿态估计计算的。精确的姿态预测对于精确定位成本计算至关重要,因为它们影响极线几何。此外,改进的深度估计也可以用于对齐姿态估计。受传统的结构运动(SfM)原理启发,我们提出了DualRefine模型,通过一个反馈环路紧密地耦合深度和姿态估计。我们的新型更新管道使用深度均衡模型框架,通过基于极线几何计算局部匹配成本来迭代地精化深度估计和特征映射的隐藏状态。重要的是,我们使用精化的深度估计和特征映射来计算每个步骤的姿态更新。姿态估计的这种更新在精化过程中缓慢改变极线几何。基于KITTI数据集的实验结果表明,我们的方法在深度预测和里程计预测性能上优于已发布的自监督基线。