We present a novel high-resolution and challenging stereo dataset framing indoor scenes annotated with dense and accurate ground-truth disparities. Peculiar to our dataset is the presence of several specular and transparent surfaces, i.e. the main causes of failures for state-of-the-art stereo networks. Our acquisition pipeline leverages a novel deep space-time stereo framework which allows for easy and accurate labeling with sub-pixel precision. We release a total of 419 samples collected in 64 different scenes and annotated with dense ground-truth disparities. Each sample include a high-resolution pair (12 Mpx) as well as an unbalanced pair (Left: 12 Mpx, Right: 1.1 Mpx). Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. We evaluate state-of-the-art deep networks based on our dataset, highlighting their limitations in addressing the open challenges in stereo and drawing hints for future research.
翻译:我们展示了一套新型的高分辨率和具有挑战性的立体数据集,以室内场景为背景,附有大量和准确的地面真象差异。我们的数据集中有一个高分辨率和透明表面,即是最先进的立体网络失败的主要原因。我们购置的管道利用了一个全新的深空时立体框架,以亚像素精确度为方便和准确的标签。我们发布了64个不同场景中收集的总共419个样本,并附有密集地面真象差异。每个样本包括高分辨率对(12 Mpx)和不平衡对(Left:12 Mpx, Right:1.1 Mpx)。此外,我们用手动提供附加说明材料分解面罩和15K无标签样本。我们根据我们的数据集评估了最先进的深层网络,突出了这些网络在应对立体和为未来研究提示的公开挑战方面的局限性。