Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx). Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. We divide the dataset into a training set, and two testing sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks respectively to highlight the open challenges and future research directions in this field.
翻译:估计目前图像的深度在内部精确度和概括性两方面都产生了突出的结果。然而,我们确定了这一领域仍然存在的两大挑战:处理非Lambertian材料和有效处理高分辨率图像。目的上,我们提出一个新的数据集,其中包括高分辨率的精确和密集的地面真象标签,以包含若干光谱和透明的表面的场景为特征。我们获取的管道利用了一个新的深空时立体框架,使得能够以子像素精确度进行简单和准确的标签。数据集由85个不同场景收集的606个样本组成,每个样本包括高分辨率对(12 Mpx)和不平衡立体配对(left:12 Mpx,右:1.1 Mpx)。此外,我们提供手动的附加说明材料分解口罩和15K无标签的样品。我们将数据集分成一个训练组和两套测试组,后者分别用于评价立体和单层深度估计网络,以突出该领域的公开挑战和未来研究方向。