Estimating a dense depth map from a single view is geometrically ill-posed, and state-of-the-art methods rely on learning depth's relation with visual appearance using deep neural networks. On the other hand, Structure from Motion (SfM) leverages multi-view constraints to produce very accurate but sparse maps, as matching across images is typically limited by locally discriminative texture. In this work, we combine the strengths of both approaches by proposing a novel test-time refinement (TTR) method, denoted as SfM-TTR, that boosts the performance of single-view depth networks at test time using SfM multi-view cues. Specifically, and differently from the state of the art, we use sparse SfM point clouds as test-time self-supervisory signal, fine-tuning the network encoder to learn a better representation of the test scene. Our results show how the addition of SfM-TTR to several state-of-the-art self-supervised and supervised networks improves significantly their performance, outperforming previous TTR baselines mainly based on photometric multi-view consistency. The code is available at https://github.com/serizba/SfM-TTR.
翻译:估计单个视图的密集深度图是几何不适定的,现有的最先进的方法依赖于使用深度神经网络学习深度与视觉外观的关系。另一方面,结构光运动(SfM)利用多视角约束来产生非常准确但稀疏的地图,因为图像匹配通常受到局部鉴别性纹理的限制。在这项工作中,我们通过提出一种新颖的测试时精炼(TTR)方法SfM-TTR,将两种方法的优点相结合,用SfM多视角线索在测试时提高单个视图深度网络的性能。具体而言,与现有技术不同,我们将稀疏的SfM点云作为测试自我监督信号,微调网络编码器以学习更好的测试场景表示。我们的结果表明,将SfM-TTR添加到几种最先进的自我监督和监督网络中,显著提高了它们的性能,超过了以光度多视角一致性为主的先前TTR基线。代码可在https://github.com/serizba/SfM-TTR上获取。