In this paper, we address monocular depth estimation with deep neural networks. To enable training of deep monocular estimation models with various sources of datasets, state-of-the-art methods adopt image-level normalization strategies to generate affine-invariant depth representations. However, learning with image-level normalization mainly emphasizes the relations of pixel representations with the global statistic in the images, such as the structure of the scene, while the fine-grained depth difference may be overlooked. In this paper, we propose a novel multi-scale depth normalization method that hierarchically normalizes the depth representations based on spatial information and depth distributions. Compared with previous normalization strategies applied only at the holistic image level, the proposed hierarchical normalization can effectively preserve the fine-grained details and improve accuracy. We present two strategies that define the hierarchical normalization contexts in the depth domain and the spatial domain, respectively. Our extensive experiments show that the proposed normalization strategy remarkably outperforms previous normalization methods, and we set new state-of-the-art on five zero-shot transfer benchmark datasets.
翻译:在本文中,我们用深神经网络处理单体深度估计问题。为了能够以各种数据集来源对深度单体估计模型进行培训,最先进的方法采用了图像一级正常化战略,以产生偏差深度表示。然而,与图像一级的正常化学习,主要强调像素表示与图像中全球统计的关系,例如场景结构,而细微深度差异可能被忽略。在本文中,我们提出一种新的多层深度正常化方法,使基于空间信息和深度分布的深度表示在等级上使深度表示正常化。与仅在整体图像一级采用的先前的正常化战略相比,拟议的等级正常化战略可以有效地保存细细细细节,提高准确性。我们提出了两种战略,分别界定深度域和空间域的等级正常化背景。我们的广泛实验表明,拟议的正常化战略明显优于以前的正常化方法,我们为5个零光传输基准数据集制定了新的状态。