In recent years, self-supervised monocular depth estimation has drawn much attention since it frees of depth annotations and achieved remarkable results on standard benchmarks. However, most of existing methods only focus on either daytime or nighttime images, thus their performance degrades on the other domain because of the large domain shift between daytime and nighttime images. To address this problem, in this paper we propose a two-branch network named GlocalFuse-Depth for self-supervised depth estimation of all-day images. The daytime and nighttime image in input image pair are fed into the two branches: CNN branch and Transformer branch, respectively, where both fine-grained details and global dependency can be efficiently captured. Besides, a novel fusion module is proposed to fuse multi-dimensional features from the two branches. Extensive experiments demonstrate that GlocalFuse-Depth achieves state-of-the-art results for all-day images on the Oxford RobotCar dataset, which proves the superiority of our method.
翻译:近些年来,自我监督的单眼深度估计自其深度说明自由以来引起人们的极大注意,并在标准基准方面取得了显著成果,然而,大多数现有方法仅侧重于白天或夜间图像,因此其性能在另一个领域下降,因为白天和夜间图像之间有很大的地域变化。为了解决这一问题,我们在本文件中提议建立一个名为GlobalFuse-Depeh的两分系统网络,用于对全天图像进行自我监督的深度估计。输入图像组合中的白天和夜间图像分别输入两个分支:CNN分支和变异器分支,这两个分支都能够有效地捕捉精细细节和全球依赖性。此外,还提议建立一个新的聚合模块,将两个分支的多维特征融合起来。广泛的实验表明,GlocalFuse-Dept在牛津机器人数据集的全天图像上取得了最新的结果,这证明了我们方法的优越性。