Depth estimation is getting a widespread popularity in the computer vision community, and it is still quite difficult to recover an accurate depth map using only one single RGB image. In this work, we observe a phenomenon that existing methods tend to exhibit asymmetric errors, which might open up a new direction for accurate and robust depth estimation. We carefully investigate into the phenomenon, and construct a two-level ensemble scheme, NENet, to integrate multiple predictions from diverse base predictors. The NENet forms a more reliable depth estimator, which substantially boosts the performance over base predictors. Notably, this is the first attempt to introduce ensemble learning and evaluate its utility for monocular depth estimation to the best of our knowledge. Extensive experiments demonstrate that the proposed NENet achieves better results than previous state-of-the-art approaches on the NYU-Depth-v2 and KITTI datasets. In particular, our method improves previous state-of-the-art methods from 0.365 to 0.349 on the metric RMSE on the NYU dataset. To validate the generalizability across cameras, we directly apply the models trained on the NYU dataset to the SUN RGB-D dataset without any fine-tuning, and achieve the superior results, which indicate its strong generalizability. The source code and trained models will be publicly available upon the acceptance.
翻译:深度估计在计算机视觉界越来越受到广泛欢迎,使用仅一个 RGB 图像来恢复准确的深度地图仍然相当困难。 在这项工作中,我们观察到一种现象,即现有方法往往出现不对称错误,这可能会为准确和稳健的深度估计开辟新的方向。我们仔细调查这一现象,并建立一个两级混合计划NENet,以整合不同基础预测器的多重预测。NENet形成一个更可靠的深度估计器,大大提升了基准预测器的性能。值得注意的是,这是首次尝试引入共同学习,并评估其对于单层深度估计的实用性,以达到我们所了解的最佳程度。广泛的实验表明,拟议的NENet比以前对NYU-Depeh-v2和KITTI数据集采用的最新方法取得了更好的结果。特别是,我们的方法改进了以前对NYU数据集的状态测量方法,从0.365到0.349。 值得注意的是,这是第一次尝试引入共同学习,并评估其对于单层深度估计的效用的实用性。我们直接应用了NENet的NENet,我们所培训的S-D的高级数据可被接受性,我们直接应用的S-GB的高级数据模型。