Existing approaches to depth or disparity estimation output a distribution over a set of pre-defined discrete values. This leads to inaccurate results when the true depth or disparity does not match any of these values. The fact that this distribution is usually learned indirectly through a regression loss causes further problems in ambiguous regions around object boundaries. We address these issues using a new neural network architecture that is capable of outputting arbitrary depth values, and a new loss function that is derived from the Wasserstein distance between the true and the predicted distributions. We validate our approach on a variety of tasks, including stereo disparity and depth estimation, and the downstream 3D object detection. Our approach drastically reduces the error in ambiguous regions, especially around object boundaries that greatly affect the localization of objects in 3D, achieving the state-of-the-art in 3D object detection for autonomous driving. Our code will be available at https://github.com/Div99/W-Stereo-Disp.
翻译:现有的深度或差异估计方法在一组预先界定的离散值上产生分布。当真正的深度或差异与这些值不相符时,这会导致不准确的结果。这种分布通常通过回归损失间接地在物体边界周围的模糊区域造成进一步问题。我们使用能够输出任意深度值的新的神经网络结构来解决这些问题,并使用来自瓦塞斯特因真实分布和预测分布之间的距离的新损失函数。我们验证了我们关于各种任务,包括立体差异和深度估计以及下游3D对象探测的方法。我们的方法大大减少了模糊区域的错误,特别是围绕物体边界的错误,大大影响了3D物体的定位,实现了3D物体探测的状态,用于自主驱动。我们的代码将在https://github.com/Div99/W-Stereo-Disp上查阅。