Stereo matching is an important problem in computer vision which has drawn tremendous research attention for decades. Recent years, data-driven methods with convolutional neural networks (CNNs) are continuously pushing stereo matching to new heights. However, data-driven methods require large amount of training data, which is not an easy task for real stereo data due to the annotation difficulties of per-pixel ground-truth disparity. Though synthetic dataset is proposed to fill the gaps of large data demand, the fine-tuning on real dataset is still needed due to the domain variances between synthetic data and real data. In this paper, we found that in synthetic datasets, close-to-real-scene texture rendering is a key factor to boost up stereo matching performance, while close-to-real-scene 3D modeling is less important. We then propose semi-synthetic, an effective and fast way to synthesize large amount of data with close-to-real-scene texture to minimize the gap between synthetic data and real data. Extensive experiments demonstrate that models trained with our proposed semi-synthetic datasets achieve significantly better performance than with general synthetic datasets, especially on real data benchmarks with limited training data. With further fine-tuning on the real dataset, we also achieve SOTA performance on Middlebury and competitive results on KITTI and ETH3D datasets.
翻译:电流匹配是数十年来引起大量研究关注的计算机愿景中的一个重要问题。 近年来,由数据驱动的神经进化网络(CNNs)数据驱动的方法正在不断推动立体匹配新高度。然而,数据驱动的方法需要大量培训数据,而对于真实立体数据来说,这不是一项容易的任务,因为每像素地心差异的辨别困难,而对于真实立体数据来说,这不是一件轻而易举的任务。虽然提出了合成数据集以填补大数据需求差距的建议,但由于合成数据和真实数据之间的域差异,仍然需要对真实数据集进行微调。在本文件中,我们发现合成数据集、近于真实的切线质质素质显示,在合成数据集中,近近于真实的立体匹配是一个关键因素,而近于真实的立体3 3D 建模则不那么重要。我们随后提出了半合成的、有效和快速的方法,将大量数据与近点至真实的质质质素合成数据合成,以最大限度地缩小合成数据与真实数据之间的差距。广泛的实验表明,用我们拟议的半同步数据模型所培训的半同步数据匹配的模型是提高立体性数据性数据性数据性能的更好。我们用一般数据也更精确地实现了数据。