We propose an embarrassingly simple but very effective scheme for high-quality dense stereo reconstruction: (i) generate an approximate reconstruction with your favourite stereo matcher; (ii) rewarp the input images with that approximate model; (iii) with the initial reconstruction and the warped images as input, train a deep network to enhance the reconstruction by regressing a residual correction; and (iv) if desired, iterate the refinement with the new, improved reconstruction. The strategy to only learn the residual greatly simplifies the learning problem. A standard Unet without bells and whistles is enough to reconstruct even small surface details, like dormers and roof substructures in satellite images. We also investigate residual reconstruction with less information and find that even a single image is enough to greatly improve an approximate reconstruction. Our full model reduces the mean absolute error of state-of-the-art stereo reconstruction systems by >50%, both in our target domain of satellite stereo and on stereo pairs from the ETH3D benchmark.
翻译:我们提出一个令人尴尬的简单而非常有效的高质量密集立体重建计划:(一) 与你最喜欢的立体配音器进行大致的重建;(二) 以该近似型号重新对输入图像进行对准;(三) 以初步重建和扭曲图像作为输入,培训一个深层网络,通过倒退剩余校正来加强重建;以及(四) 如果需要,则通过新的、经过改进的重建来推动完善。只了解剩余部分的战略大大简化了学习问题。一个没有钟声和哨声的标准Unet足以重建小的表面细节,如卫星图像中的宿舍和屋顶子结构。我们还用较少的信息对剩余重建进行调查,发现即使是单一图像也足以大大改善近似重建。我们的完整模型将目前最先进的立体重建系统的绝对错误减少50%以上,包括卫星立体器的目标领域和从ETH3D基准的立体管配体上。