Recent deep learning approaches for multi-view depth estimation are employed either in a depth-from-video or a multi-view stereo setting. Despite different settings, these approaches are technically similar: they correlate multiple source views with a keyview to estimate a depth map for the keyview. In this work, we introduce the Robust Multi-View Depth Benchmark that is built upon a set of public datasets and allows evaluation in both settings on data from different domains. We evaluate recent approaches and find imbalanced performances across domains. Further, we consider a third setting, where camera poses are available and the objective is to estimate the corresponding depth maps with their correct scale. We show that recent approaches do not generalize across datasets in this setting. This is because their cost volume output runs out of distribution. To resolve this, we present the Robust MVD Baseline model for multi-view depth estimation, which is built upon existing components but employs a novel scale augmentation procedure. It can be applied for robust multi-view depth estimation, independent of the target data. We provide code for the proposed benchmark and baseline model at https://github.com/lmb-freiburg/robustmvd.
翻译:用于多视图深度估计的近期深层次学习方法, 要么在从视频深度或多视图立体设置中采用。 尽管存在不同的设置, 这些方法在技术上是相似的: 它们将多种源视图与用于估计关键视图深度地图的关键视图联系起来。 在这项工作中, 我们引入了基于一套公共数据集的强效多视图深度基准, 并允许在两种设置中对不同领域的数据进行评估。 我们评估了最近的方法, 并发现不同领域的不均匀性能。 此外, 我们考虑第三个设置, 在那里有照相机, 目标是用正确的比例来估计相应的深度地图。 我们表明, 最近的方法没有将多个源视图与一个关键视图联系起来, 以估计关键视图的深度。 这是因为它们的成本量输出已经耗尽。 为了解决这个问题, 我们提出了用于多视图深度估计的强力MVD基线模型, 建在现有的组件上, 但采用了新的规模增强程序。 它可以用于可靠的多视图深度估计, 并且独立于目标数据。 我们为 https://github.com/ lmbrob- freburg/burgy/ burgy 提供拟议基准和基线模型的代码。