Traditional depth sensors generate accurate real world depth estimates that surpass even the most advanced learning approaches trained only on simulation domains. Since ground truth depth is readily available in the simulation domain but quite difficult to obtain in the real domain, we propose a method that leverages the best of both worlds. In this paper we present a new framework, ActiveZero, which is a mixed domain learning solution for active stereovision systems that requires no real world depth annotation. First, we demonstrate the transferability of our method to out-of-distribution real data by using a mixed domain learning strategy. In the simulation domain, we use a combination of supervised disparity loss and self-supervised losses on a shape primitives dataset. By contrast, in the real domain, we only use self-supervised losses on a dataset that is out-of-distribution from either training simulation data or test real data. Second, our method introduces a novel self-supervised loss called temporal IR reprojection to increase the robustness and accuracy of our reprojections in hard-to-perceive regions. Finally, we show how the method can be trained end-to-end and that each module is important for attaining the end result. Extensive qualitative and quantitative evaluations on real data demonstrate state of the art results that can even beat a commercial depth sensor.
翻译:传统深度传感器产生准确真实的世界深度估计,甚至超越了在模拟领域所培训的最先进的学习方法。由于模拟领域很容易获得地面真相深度,但在真实领域很难获得,我们建议一种利用两个世界最佳数据的方法。在本文中,我们提出了一个新的框架,即“主动零”(Penter Zero),这是活跃的立体系统的一种混合领域学习解决方案,不需要真实世界深度注释。首先,我们通过使用混合域学习战略,展示我们方法向分配之外真实数据的可转移性。在模拟领域,我们使用一种在形状原始数据集上监督差异损失和自我监督损失的组合。相比之下,在真实领域,我们只使用自我监督损失的数据集,该数据集不是培训模拟数据就是测试真实数据,就是测试真实数据。第二,我们的方法引入了一种新的自我监督损失,称为时间IR再预测,以提高我们在难以观察的区域重新预测的可靠性和准确性。最后,我们展示了该方法如何经过培训的终端到端分析结果,而每种质量传感器都能够实现一个重要的结果。