Self-supervised monocular depth estimation refers to training a monocular depth estimation (MDE) network using only RGB images to overcome the difficulty of collecting dense ground truth depth. Many previous works addressed this problem using depth classification or depth regression. However, depth classification tends to fall into local minima due to the bilinear interpolation search on the target view. Depth classification overcomes this problem using pre-divided depth bins, but those depth candidates lead to discontinuities in the final depth result, and using the same probability for weighted summation of color and depth is ambiguous. To overcome these limitations, we use some predefined planes that are parallel to the ground, allowing us to automatically segment the ground and predict continuous depth for it. We further model depth as a mixture Laplace distribution, which provides a more certain objective for optimization. Previous works have shown that MDE networks only use the vertical image position of objects to estimate the depth and ignore relative sizes. We address this problem for the first time in both stereo and monocular training using resize cropping data augmentation. Based on our analysis of resize cropping, we combine it with our plane definition and improve our training strategy so that the network could learn the relationship between depth and both the vertical image position and relative size of objects. We further combine the self-distillation stage with post-processing to provide more accurate supervision and save extra time in post-processing. We conduct extensive experiments to demonstrate the effectiveness of our analysis and improvements.
翻译:自我监督的单心深度估计是指培训单心深度估计(MDE)网络,仅使用 RGB 图像来克服收集密集地面真相深度的困难。许多先前的工程利用深度分类或深度回归来解决这个问题。然而,由于目标视图上的双线内插搜索,深度分类往往属于本地微型。深度分类用预先分割的深度箱克服了这一问题,但深度候选人在最后深度结果中导致不连续性,并且使用相同的比例对颜色和深度进行加权比对的可能性是模糊不清的。为了克服这些限制,我们使用一些与地面平行的预设平面,使我们能够自动分割地面并预测其持续深度。我们进一步将深度作为混合的Laplace分布模型,这为优化提供了更明确的目标。先前的工程显示,MDE网络仅使用物体的垂直图像位置来估计深度,而这些深度候选人则导致最后深度结果的不连续,而使用对颜色和深度的加权比重比重比重比重加整数据。我们根据对重新裁剪裁幅所作的分析,我们将其与地面的裁剪裁,我们将其与相对深度结合起来,我们将其与相对的深度分布结合起来,从而将缩小的图像和纵向分析结合起来,从而将改进我们的平整后进行我们的磁深层分析。我们用我们的磁层的图像和垂直的图像的图像结合起来。我们把我们的图像与后网络结合起来,我们将我们的磁层的深度分析结合起来,我们更深层的实验关系结合起来,从而将我们的图像结合起来,我们更精确的升级化和纵向分析。