We present a novel method for predicting accurate depths from monocular images with high efficiency. This optimal efficiency is achieved by exploiting wavelet decomposition, which is integrated in a fully differentiable encoder-decoder architecture. We demonstrate that we can reconstruct high-fidelity depth maps by predicting sparse wavelet coefficients. In contrast with previous works, we show that wavelet coefficients can be learned without direct supervision on coefficients. Instead we supervise only the final depth image that is reconstructed through the inverse wavelet transform. We additionally show that wavelet coefficients can be learned in fully self-supervised scenarios, without access to ground-truth depth. Finally, we apply our method to different state-of-the-art monocular depth estimation models, in each case giving similar or better results compared to the original model, while requiring less than half the multiply-adds in the decoder network. Code at https://github.com/nianticlabs/wavelet-monodepth
翻译:我们展示了一种新颖的方法,用高效率从单眼图像中准确预测精度。这种最佳效率是通过利用波盘分解来实现的,它被整合到完全不同的编码编码解密结构中。我们展示了我们能够通过预测稀疏的波子系数来重建高虚度深度地图。与以往的作品相比,我们显示,在不直接监督系数的情况下,可以学习波子系数。相反,我们只监督通过反波盘变换而重建的最后深度图像。我们还显示,波盘系数可以在完全自我监督的情景中学习,而不能进入地面的深度。最后,我们将我们的方法应用到不同的最先进的单眼深度估计模型,在每种情况下都提供与原始模型相似或更好的结果,同时在 decoder 网络中要求不到一半的乘数。 代码见 https://github.com/nianticlabs/worlet-mononocent。