We propose a novel multi-stage depth super-resolution network, which progressively reconstructs high-resolution depth maps from explicit and implicit high-frequency features. The former are extracted by an efficient transformer processing both local and global contexts, while the latter are obtained by projecting color images into the frequency domain. Both are combined together with depth features by means of a fusion strategy within a multi-stage and multi-scale framework. Experiments on the main benchmarks, such as NYUv2, Middlebury, DIML and RGBDD, show that our approach outperforms existing methods by a large margin (~20% on NYUv2 and DIML against the contemporary work DADA, with 16x upsampling), establishing a new state-of-the-art in the guided depth super-resolution task.
翻译:我们提出了一种新的多阶段深度超分辨率网络,逐步从显式和隐式高频特征中重建高分辨率深度图。前者是通过一个高效的变换器处理局部和全局上下文提取的,而后者则是通过将彩色图像投影到频域中获得的。两者通过多阶段和多尺度框架中的融合策略与深度特征结合在一起。在主要基准测试上,如NYUv2、Middlebury、DIML和RGBDD上的实验表明,我们的方法在引导深度超分辨率任务中击败了现有方法 ~20%(对比16x增强的DADA),在NYUv2和DIML上建立了一个新的技术水平。