Estimating accurate depth from a single image is challenging because it is an ill-posed problem as infinitely many 3D scenes can be projected to the same 2D scene. However, recent works based on deep convolutional neural networks show great progress with plausible results. The convolutional neural networks are generally composed of two parts: an encoder for dense feature extraction and a decoder for predicting the desired depth. In the encoder-decoder schemes, repeated strided convolution and spatial pooling layers lower the spatial resolution of transitional outputs, and several techniques such as skip connections or multi-layer deconvolutional networks are adopted to recover the original resolution for effective dense prediction. In this paper, for more effective guidance of densely encoded features to the desired depth prediction, we propose a network architecture that utilizes novel local planar guidance layers located at multiple stages in the decoding phase. We show that the proposed method outperforms the state-of-the-art works with significant margin evaluating on challenging benchmarks. We also provide results from an ablation study to validate the effectiveness of the proposed method.
翻译:从单一图像中估计准确的深度是困难的,因为它是一个错误的问题,因为可以将无限多的三维场景投射到同一场景。然而,最近基于深相神经网络的工程显示了巨大的进步,并取得了可信的结果。 卷发神经网络一般由两部分组成:密集地物提取的编码器和预测所期望深度的解码器。 在编码器解码器计划中,反复的曲折和空间集合层会降低过渡性产出的空间分辨率,并采用一些技术,例如跳过连接或多层分流网络,以恢复有效密集预测的原始分辨率。在本文中,为了更有效地指导所期望的深度预测,我们提议了一个网络结构,利用位于解码阶段多个阶段的新颖的本地编码指导层。我们表明,拟议的方法超越了对具有挑战性的基准进行的重大比值评价的状态。我们还提供了一项反相研究的结果,以验证拟议方法的有效性。