Multiple near frontal-parallel planes based depth representation demonstrated impressive results in self-supervised monocular depth estimation (MDE). Whereas, such a representation would cause the discontinuity of the ground as it is perpendicular to the frontal-parallel planes, which is detrimental to the identification of drivable space in autonomous driving. In this paper, we propose the PlaneDepth, a novel orthogonal planes based presentation, including vertical planes and ground planes. PlaneDepth estimates the depth distribution using a Laplacian Mixture Model based on orthogonal planes for an input image. These planes are used to synthesize a reference view to provide the self-supervision signal. Further, we find that the widely used resizing and cropping data augmentation breaks the orthogonality assumptions, leading to inferior plane predictions. We address this problem by explicitly constructing the resizing cropping transformation to rectify the predefined planes and predicted camera pose. Moreover, we propose an augmented self-distillation loss supervised with a bilateral occlusion mask to boost the robustness of orthogonal planes representation for occlusions. Thanks to our orthogonal planes representation, we can extract the ground plane in an unsupervised manner, which is important for autonomous driving. Extensive experiments on the KITTI dataset demonstrate the effectiveness and efficiency of our method. The code is available at https://github.com/svip-lab/PlaneDepth.
翻译:多面非正交的深度表示方式在自我监督的单目深度估计(MDE)中都取得了令人印象深刻的结果。然而,这种表示方式会导致地面的不连续性,因为它垂直于前视平面,这对于自主驾驶的行驶空间识别是有害的。在本文中,我们提出了PlaneDepth,一种基于正交平面的新型表示方法,包括垂直平面和地面平面。PlaneDepth使用拉普拉斯混合模型估计正交平面上的深度分布,从而为输入图像提供自我监督信号。此外,我们发现,广泛使用的缩放和裁剪数据增强会破坏正交性假设,导致较差的平面预测结果。我们通过明确构造缩放裁剪变换,以校正预定义的平面和预测的相机姿态,来解决这个问题。此外,我们提出了增强型自蒸馏损失,用双边遮挡掩蔽进行监督,以提高正交平面表示的抗干扰性。由于我们的正交平面表示方法,我们可以以无监督的方式提取地面平面,这对于自主驾驶非常重要。在KITTI数据集上进行的大量实验表明了我们方法的有效性和高效性。代码可在https://github.com/svip-lab/PlaneDepth获取。