Depth estimation is a cornerstone of a vast number of applications requiring 3D assessment of the environment, such as robotics, augmented reality, and autonomous driving to name a few. One prominent technique for depth estimation is stereo matching which has several advantages: it is considered more accessible than other depth-sensing technologies, can produce dense depth estimates in real-time, and has benefited greatly from the advances of deep learning in recent years. However, current techniques for depth estimation from stereoscopic images still suffer from a built-in drawback. To reconstruct depth, a stereo matching algorithm first estimates the disparity map between the left and right images before applying a geometric triangulation. A simple analysis reveals that the depth error is quadratically proportional to the object's distance. Therefore, constant disparity errors are translated to large depth errors for objects far from the camera. To mitigate this quadratic relation, we propose a simple but effective method that uses a refinement network for depth estimation. We show analytical and empirical results suggesting that the proposed learning procedure reduces this quadratic relation. We evaluate the proposed refinement procedure on well-known benchmarks and datasets, like Sceneflow and KITTI datasets, and demonstrate significant improvements in the depth accuracy metric.
翻译:深度估算是需要对环境进行三维评估的大量应用的基石,例如机器人、增强的现实和自主驱动等。深度估算的一个突出技术是立体比对等,它具有若干优点:它被认为比其他深度测量技术更容易获得,可以实时产生密集深度估算,并且从近年来深层学习的进展中大大获益。然而,目前从立体图像进行深度估算的技术仍然受到内在缺陷的影响。为了重建深度,立体匹配算法首先估计左边和右边图像之间的差异图,然后进行几何三角测量。简单分析表明深度误差与物体的距离成四等比例。因此,恒定差异误被转化为距离远于相机的物体的深度误差。为了减轻这种四面关系,我们提出了一个简单而有效的方法,利用精细的网络进行深度估算。我们展示了分析和经验结果,表明拟议的学习程序会减少这种二次关系。我们评估了关于众所周知的基准和数据集的拟议改进程序,如Sceneproll和KITTI数据深度的精确度。