Depth estimation is a challenging task of 3D reconstruction to enhance the accuracy sensing of environment awareness. This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps compared to existing methods. Recently, a convolutional neural network (CNN) has demonstrated its extraordinary ability in estimating depth maps from monocular videos. However, traditional CNN does not support topological structure and they can work only on regular image regions with determined size and weights. On the other hand, graph convolutional networks (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure. Therefore, in this work in order to preserve object geometric appearances and distributions, we aim at exploiting GCN for a self-supervised depth estimation model. Our model consists of two parallel auto-encoder networks: the first is an auto-encoder that will depend on ResNet-50 and extract the feature from the input image and on multi-scale GCN to estimate the depth map. In turn, the second network will be used to estimate the ego-motion vector (i.e., 3D pose) between two consecutive frames based on ResNet-18. Both the estimated 3D pose and depth map will be used for constructing a target image. A combination of loss functions related to photometric, projection, and smoothness is used to cope with bad depth prediction and preserve the discontinuities of the objects. In particular, our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets along with a reduction of 40% in the number of trainable parameters compared to the state of the art solutions. The source code is publicly available at https://github.com/ArminMasoumian/GCNDepth.git
翻译:深度估算是3D重建的一项具有挑战性的任务,目的是提高环境意识的准确度。这项工作带来了一套改进的新解决方案,提高了对深度地图的定量和定性理解,与现有方法相比,这提高了对深度地图的定量和定性理解。最近,一个连动神经网络(CNN)展示了它利用单体视频来估计深度地图的非凡能力。然而,传统的CNN并不支持地形结构,它们只能用于具有确定大小和重量的常规图像区域。另一方面,图形共振网络(GCN)能够处理非欧洲域域内数据的演进,并可用于表层结构内的异常图像区域。因此,在这项工作中,为了保护对象的几度外观外观外观外观和分布,我们的目标是利用GCN进行自我超强的深度估算。我们的模型由两个平行的自动编码网络组成:第一个是自动编码,它将依靠 ResNet-50 来提取输入图像的特征,以及多级GCN- 用于估计深度地图的特征。反过来,第二个网络将用来用来对可变化的内基图的精确度进行对比性预测。 将用来对一个用于构建一个连续的自我- D 和图像的图像的图像的模型的缩缩缩图,将使用。