Omnidirectional multi-view stereo (MVS) vision is attractive for its ultra-wide field-of-view (FoV), enabling machines to perceive 360{\deg} 3D surroundings. However, the existing solutions require expensive dense depth labels for supervision, making them impractical in real-world applications. In this paper, we propose the first unsupervised omnidirectional MVS framework based on multiple fisheye images. To this end, we project all images to a virtual view center and composite two panoramic images with spherical geometry from two pairs of back-to-back fisheye images. The two 360{\deg} images formulate a stereo pair with a special pose, and the photometric consistency is leveraged to establish the unsupervised constraint, which we term "Pseudo-Stereo Supervision". In addition, we propose Un-OmniMVS, an efficient unsupervised omnidirectional MVS network, to facilitate the inference speed with two efficient components. First, a novel feature extractor with frequency attention is proposed to simultaneously capture the non-local Fourier features and local spatial features, explicitly facilitating the feature representation. Then, a variance-based light cost volume is put forward to reduce the computational complexity. Experiments exhibit that the performance of our unsupervised solution is competitive to that of the state-of-the-art (SoTA) supervised methods with better generalization in real-world data.
翻译:光向多视多视立体(MVS)的视觉对其超广域视野(FoV)具有吸引力,使机器能够感知360=deg}3D周围环境。然而,现有的解决方案需要昂贵的密集深度标签来进行监督,使其在现实世界应用程序中不切实际。在本文中,我们建议第一个基于多鱼眼图像的无人监督全视MVS框架。为此,我们将所有图像投射到一个虚拟视图中心,并合成两对背对背鱼眼图像中具有球形几何的两幅全景图像。两对背对背鱼眼图像中,两张具有球形几何几何特征的图像。两张360{deg}图像制作了带有特殊外观的立体配对,而光度一致性则被用来建立非超强的制约。我们称之为“Psedo-Sepedo-Sterouteau 监督”。此外,我们建议使用一个高效的不超超光向全景的MVS网络,用两个高效的本地组件来加速推断速度。首先,一个具有频率的新型地谱摄像器注意和频率注意的新型立点,用来同时捕测测图,用来测量前的图像的模拟的图像的图像,可以同时测量到另一个的模拟的图像的模拟的模拟的图像的图像的模拟的模拟的模拟的模拟的模拟的模拟的图像, 。