Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed. While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization. Due to the expensive computation, CRFs are usually performed between neighborhoods rather than the whole graph. To leverage the potential of fully-connected CRFs, we split the input into windows and perform the FC-CRFs optimization within each window, which reduces the computation complexity and makes FC-CRFs feasible. To better capture the relationships between nodes in the graph, we exploit the multi-head attention mechanism to compute a multi-head potential function, which is fed to the networks to output an optimized depth map. Then we build a bottom-up-top-down structure, where this neural window FC-CRFs module serves as the decoder, and a vision transformer serves as the encoder. The experiments demonstrate that our method significantly improves the performance across all metrics on both the KITTI and NYUv2 datasets, compared to previous methods. Furthermore, the proposed method can be directly applied to panorama images and outperforms all previous panorama methods on the MatterPort3D dataset. Project page: https://weihaosky.github.io/newcrfs.
翻译:从单一图像中估计准确深度具有挑战性,因为它本质上是模糊的,而且错误的。虽然最近的工程设计日益复杂和强大的网络,以直接反向深度地图,但我们选择了统一格式优化的道路。由于计算费用昂贵,通用报告格式通常是在邻居之间而不是整个图形之间进行。为了利用完全连接的通用报告格式的潜力,我们将输入分为窗口,并在每个窗口内进行FC-CRF优化,这样可以降低计算复杂性,使FC-CRF成为可行。为了更好地捕捉图中各节点之间的关系,我们利用多头关注机制来计算多头潜在功能,而多头潜在功能被输入到网络以输出优化的深度地图。然后,我们建立一个自下至上自上至下的结构,在这个神经窗口FC-CRF模块用作解码器,而一个视觉变压器作为编码器。实验表明,我们的方法大大改进了KITTI和NYUV2数据集的性能,与以前的方法相比,我们利用多头关注机制来计算多头潜在功能。此外,向网络输入了多头潜在功能以输出的功能以输出优化的深度地图。然后,我们建立一个自上自上自上至下自上至下至下自上方的图像的系统。我们所提议的方法可以直接应用的GMAFCRFCRFS&FS&FS&FSFSAPFS&FSFSFSFSFSFSDFSFSFSFSDFSDFSDFSDFSDFSDFSDFSDFSD的图式的图式图式图式图式图式方法。此外,可以直接应用到FSMASMASMASMASMASMASMASMASMASMASMASMASMASDFMASMASDFMASDFMASMASDFMASMASMASMASDFMASDFMASDFDFDFDFMASMAFDFDFDFDFDFDFSMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASMASD的所有方法。T的所有方法直接应用