Recovering the spatial layout of the cameras and the geometry of the scene from extreme-view images is a longstanding challenge in computer vision. Prevailing 3D reconstruction algorithms often adopt the image matching paradigm and presume that a portion of the scene is co-visible across images, yielding poor performance when there is little overlap among inputs. In contrast, humans can associate visible parts in one image to the corresponding invisible components in another image via prior knowledge of the shapes. Inspired by this fact, we present a novel concept called virtual correspondences (VCs). VCs are a pair of pixels from two images whose camera rays intersect in 3D. Similar to classic correspondences, VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views. Therefore VCs can be established and exploited even if images do not overlap. We introduce a method to find virtual correspondences based on humans in the scene. We showcase how VCs can be seamlessly integrated with classic bundle adjustment to recover camera poses across extreme views. Experiments show that our method significantly outperforms state-of-the-art camera pose estimation methods in challenging scenarios and is comparable in the traditional densely captured setup. Our approach also unleashes the potential of multiple downstream tasks such as scene reconstruction from multi-view stereo and novel view synthesis in extreme-view scenarios.
翻译:将摄像头的空间布局和场景的几何从极端视图图像中恢复到摄像头的空间布局和场景的几何是计算机视觉中的一项长期挑战。 常用的 3D 重建算法通常采用图像匹配模式, 并假设部分场景在图像中可同时看到, 在输入量很少重叠的情况下, 产生不良的性能。 相反, 人类可以通过先前对形状的了解, 将一个图像中的可见部分与另一个图像中相应的无形组成部分联系起来。 我们受此事实启发, 我们展示了一个叫作虚拟通信( VCs)的新概念。 VCs 是两张图像的一对像素, 其相机射线在 3D 中相互交叉。 类似经典的对像, VCs 和 经典的对等的对等式图像, 与经典的对等相近; 与经典的对等式通信, VCs不需要在各种观点中共同看到。 因此,即使图像不会重叠, VCs 也可以在另一个图像中找到虚拟通信的虚拟通信组合, 也显示我们的方法具有挑战性地显示我们 的深度的对地平面的图像中, 。