Monocular depth estimation is an ambiguous problem, thus global structural cues play an important role in current data-driven single-view depth estimation methods. Panorama images capture the complete spatial information of their surroundings utilizing the equirectangular projection which introduces large distortion. This requires the depth estimation method to be able to handle the distortion and extract global context information from the image. In this paper, we propose an end-to-end deep network for monocular panorama depth estimation on a unit spherical surface. Specifically, we project the feature maps extracted from equirectangular images onto unit spherical surface sampled by uniformly distributed grids, where the decoder network can aggregate the information from the distortion-reduced feature maps. Meanwhile, we propose a global cross-attention-based fusion module to fuse the feature maps from skip connection and enhance the ability to obtain global context. Experiments are conducted on five panorama depth estimation datasets, and the results demonstrate that the proposed method substantially outperforms previous state-of-the-art methods. All related codes will be open-sourced in the upcoming days.
翻译:单心深度估算是一个模糊的问题,因此,全球结构线索在当前数据驱动的单视深度估算方法中起着重要作用。全景图像利用引入大扭曲的等离子投影,捕捉周围的完整空间信息。这要求深度估算方法能够处理扭曲并从图像中提取全球背景信息。在本文中,我们提议对单位球面表面进行单眼全景深度估算的端到端深网络。具体地说,我们用统一分布的网格将从等离子图像提取到单位球面表面的地貌图进行预测,在这种网格中,解码网络可以汇总从扭曲减少的地貌图中获取的信息。同时,我们提出一个基于全球跨注意的聚变模块,将地貌图从跳过连接中整合起来,提高获取全球背景的能力。在五个全景深度估算数据集上进行了实验,结果显示,拟议的方法大大超出以往的状态方法。所有相关代码将在未来几天内开源。