Depth estimation is a crucial step for 3D reconstruction with panorama images in recent years. Panorama images maintain the complete spatial information but introduce distortion with equirectangular projection. In this paper, we propose an ACDNet based on the adaptively combined dilated convolution to predict the dense depth map for a monocular panoramic image. Specifically, we combine the convolution kernels with different dilations to extend the receptive field in the equirectangular projection. Meanwhile, we introduce an adaptive channel-wise fusion module to summarize the feature maps and get diverse attention areas in the receptive field along the channels. Due to the utilization of channel-wise attention in constructing the adaptive channel-wise fusion module, the network can capture and leverage the cross-channel contextual information efficiently. Finally, we conduct depth estimation experiments on three datasets (both virtual and real-world) and the experimental results demonstrate that our proposed ACDNet substantially outperforms the current state-of-the-art (SOTA) methods. Our codes and model parameters are accessed in https://github.com/zcq15/ACDNet.
翻译:深度估算是近年来利用全景图像重建 3D 的关键步骤 。 全景图像维护完整的空间信息,但引入了偏差 。 在本文中, 我们提议基于适应性结合变异变异的 ACDNet 来预测单形全景图像的密度深度映射。 具体地说, 我们结合变异内核和不同变相来扩展矩形投影的可接收域。 同时, 我们引入了适应性信道融合模块, 以总结地貌地图, 并让各频道的可接收场得到不同关注区域 。 由于在构建适应性信道融合模块时, 网络可以高效地捕捉和利用跨通道背景信息 。 最后, 我们对三个数据集( 虚拟和真实世界) 进行深度估算实验, 实验结果显示我们提议的 ACDNet 大大超越了当前状态艺术( SOTA) 方法 。 我们的代码和模型参数在 https://github.com/ zcq15/ACNet 中访问 。