Estimating accurate depth from a single image is challenging because it is an ill-posed problem as infinitely many 3D scenes can be projected to the same 2D scene. However, recent works based on deep convolutional neural networks show great progress with plausible results. The convolutional neural networks are generally composed of two parts: an encoder for dense feature extraction and a decoder for predicting the desired depth. In the encoder-decoder schemes, repeated strided convolution and spatial pooling layers lower the spatial resolution of transitional outputs, and several techniques such as skip connections or multi-layer deconvolutional networks are adopted to recover back to the original resolution for effective dense prediction. In this paper, for more effective guidance of densely encoded features to the desired depth prediction, we propose a network architecture that utilizes novel local planar guidance layers located at multiple stages in the decoding phase. We show that the proposed method outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks. We also provide results from an ablation study to validate the effectiveness of the proposed method.
翻译:从单一图像中估计准确的深度是困难的,因为它是一个错误的问题,因为可以将无限多的三维场景投射到同一场景。然而,最近基于深层进化神经网络的工程显示了巨大的进展,并取得了可信的结果。进化神经网络一般由两部分组成:密集地物提取的编码器和预测理想深度的解码器。在编码器解码器计划中,反复的曲折和空间集合层降低了过渡性产出的空间分辨率,并且采用了一些技术,例如跳过连接或多层分流网络,以恢复到原始的分辨率,以便进行有效的密集预测。在本文中,为了更有效地指导高密度的编码特征,我们提出了一个网络结构,利用位于解码阶段多个阶段的新颖的本地平面指导层。我们表明,拟议的方法超越了对具有挑战性的基准进行的重大比值评价的状态。我们还提供了一项关于验证拟议方法有效性的断层研究的结果。