Structure from motion (SfM) has recently been formulated as a self-supervised learning problem, where neural network models of depth and egomotion are learned jointly through view synthesis. Herein, we address the open problem of how to best couple, or link, the depth and egomotion network components, so that information such as a common scale factor can be shared between the networks. Towards this end, we introduce several notions of coupling, categorize existing approaches, and present a novel tightly-coupled approach that leverages the interdependence of depth and egomotion at training time and at test time. Our approach uses iterative view synthesis to recursively update the egomotion network input, permitting contextual information to be passed between the components. We demonstrate through substantial experiments that our approach promotes consistency between the depth and egomotion predictions at test time, improves generalization, and leads to state-of-the-art accuracy on indoor and outdoor depth and egomotion evaluation benchmarks.
翻译:最近,从运动(SfM)结构中形成了一个自我监督的学习问题,在这个问题中,深度和自我感的神经网络模型是通过视觉合成共同学习的。在这里,我们解决了如何最佳地组合或连接深度和自我感化网络组件的开放问题,以便网络之间能够共享共同规模因素等信息。为此,我们引入了多种混合概念,对现有方法进行分类,并提出了一种新的紧密结合的方法,在培训时间和测试时间利用深度和自我感的相互依存关系。我们的方法利用迭代视图合成来循环更新自我感网络输入,允许在组件之间传递背景信息。我们通过实质性实验表明,我们的方法促进了测试时间深度和自我感预测的一致性,改进了总体化,并导致室内和室外深度和自我感评估基准的先进准确性。